1. Nobody can tell if it's improving
Without an evaluation set — real cases, expected answers, a score — 'the AI works' is an opinion. Every prompt iteration becomes a gamble, and the project dies the day someone important hits a bad answer. The eval isn't a luxury: it's the first thing to build, before the pipeline itself.