Every week now somebody sends me a video. An AI agent that books a flight. An agent that builds a whole app from one sentence. An agent army that runs a software company while the founder sleeps. The demos are impressive. The music in the videos is always very confident.

I have been writing software for a long time, and I ran my own business for eleven years of it, so I have survived a few hype cycles already. My rule is simple. I do not judge a technology by its demo. I judge it by what it does on a random Tuesday, with real data, with a confused user, when nobody is filming.

So let me try to separate, honestly, what I see working in mid 2024 and what is still mostly theater.

What actually works in production today

First, the narrow assistants. Copilot style code completion, drafting emails, summarizing documents, answering questions about a defined set of content. These work because the scope is small, a human reviews every output, and a wrong answer costs seconds, not dollars. The human is the safety net, and the tool only needs to be right often enough to save time on average. That bar is reachable, and many teams are clearing it.

Second, RAG, retrieval augmented generation. You take your own documents, your knowledge base, your policies, you index them, and the model answers questions grounded in that content instead of inventing from memory. Customer support deflection, internal helpdesks, search over documentation. This is the least sexy corner of the AI world and it is where I see real, measurable returns. Fewer support tickets is a number a CFO understands. Faster onboarding is a number too.

Notice the shape of these wins. Narrow task. Human nearby. Cheap mistakes. Grounded in real data. Nothing autonomous about it.

What is still mostly demo

Then there are the autonomous agents. Frameworks that chain LLM calls in loops, give the model tools, and let it plan and execute multi step tasks by itself. The demos are everywhere this year, and some of the engineering behind them is genuinely clever.

A clean demo arrow above a chain of steps where one link fails

But here is the problem nobody puts in the demo video: error compounding. Suppose each step the agent takes is right 90 percent of the time, which is generous. Chain ten steps and the chance the whole run is clean drops below 40 percent. Real workflows have more than ten steps, and real steps are messier than demo steps. The agent meets a date in a weird format, a page that changed its layout, a record that contradicts another record, an instruction that was obvious to a human and invisible to a model. It does not fail loudly. It continues, confidently, in the wrong direction. Confident and wrong is the most expensive combination in software.

The demos avoid this by living in a cleaned world. Tidy inputs, happy paths, tasks chosen because they demo well. That is not cheating exactly, every demo in history did this. But it means the distance between demo and production is much bigger than the video suggests, and that distance is where budgets go to die.

How to spend money on this without regret

I am not telling anyone to ignore agents. I am telling them to buy what exists, not what is promised. A few rules I push for when these conversations happen:

  • Start from a cost you already have. Support volume, manual data entry, time lost searching for information. If the AI project does not attack a measurable existing cost, it is a science project. Science projects are fine, but label the budget honestly.
  • Prefer boring and grounded over autonomous and magical. A RAG assistant that answers 60 percent of support questions correctly, with sources, beats an agent that does an entire workflow correctly 60 percent of the time. The first failure mode wastes a click. The second one corrupts your data.
  • Pilot with real data on week one. Not after the contract. The gap between demo and reality shows up in the first afternoon with your actual messy inputs, and it is much cheaper to discover it before you sign anything.
  • Count the babysitting. If a human must check every output anyway, you have an assistant, not an agent. That can still be a great deal. Just price it as what it is.

Here in Calgary I sometimes compare it to the weather. The morning can look like full summer and you still bring a jacket, because you have been surprised before. Same energy applies to AI demos.

There is also a team health angle that gets ignored. When leadership buys hype and the tool underdelivers, engineers are the ones squeezed between the promise and the reality. They burn months gluing together something that was sold as ready, morale drops, and the next genuinely useful AI proposal gets met with eye rolls. Hype does not just waste the money you spent. It taxes the next three good ideas.

The pattern of overpromising automation is old. Martin Fowler has been documenting for two decades how every wave of tools that promised to remove the need for careful engineering ended up needing careful engineering to work at all. I see no evidence this wave is different.

To be clear, I am not making predictions. The models improve fast, and what is demo today may be product later. I am only assessing what I can verify in June 2024. And in June 2024, the honest summary is this: assistants are real, RAG is real, autonomous agents are a research frontier with a marketing department.

Buy the real things. Watch the frontier with interest and a closed wallet.

Pax et bonum.