The Electric Motor
When factories in the late nineteenth century replaced the steam engine with the electric motor, very little changed at first.
Before, there had been one large steam engine at the edge of the building. From there, shafts, belts and mechanical linkages ran through the entire hall. Every machine depended on the same central power source. After the upgrade, there was one large electric motor in the same place. The same shafts. The same belts. The same rhythm.
The factory looked more modern. It was not fundamentally organized in a different way.
The actual productivity jump came later. Not from a better motor in the same position, but from a different architecture. Small motors attached directly to individual machines. Suddenly each machine could run independently. Be placed elsewhere. Be switched off when not needed. The process was not optimized. It was rethought around the new source of power.
That is where most companies are right now with AI.
They are replacing the large steam engine with a large electric motor and wondering why the results still feel small.
The First Confusion
Most AI projects do not fail because the model is too weak. They fail because of a very old confusion.
People mistake the existing process for the work itself.
Meetings, reviews, sprint planning, handoffs, specifications, ticket chains, approval loops, decks, transfers between departments. In large organizations all of that feels so normal that it starts to look like nature. It is not. Much of it is coordination infrastructure. Built so that many human brains with limited context, limited working memory and limited availability can produce something together.
That is not meant as an insult. This infrastructure had and still has a purpose.
But the distinction matters:
Which part is the actual value creation?
And which part is just the gearbox we built to compensate for human limitations?
As long as humans do the execution, that question is uncomfortable but not existential. Once agent systems enter the picture, it becomes central. A meaningful share of that infrastructure exists only because knowledge has to be moved between people.
When the source of power changes, that layer becomes questionable.
What the Transmission Belts Look Like Today
The transmission belts of knowledge work are no longer made of leather and steel. They are calendars, ticket systems, review processes and PowerPoint.
Meetings often exist because people need to get each other onto the same page.
PRDs and specifications exist because the person who wants something and the person who builds it are not the same person and do not share the same context.
Sprint planning exists because work has to be sliced into packets that fit limited human capacity.
Code reviews are not only quality control. They are also trust mechanics between people.
Handoff documents exist because context has to be pushed from one head into the next.
Decks exist because complex situations have to fit into a twenty-minute executive slot.
Look at it coldly, and the picture becomes uncomfortable. A large share of modern knowledge work is not the work itself. It is coordination around the work.
That is why the numbers on coordination overhead remain so irritating. Microsoft’s Work Trend Index and similar studies keep pointing into the same range: a large part of knowledge work consists of communication about work, not work on the underlying thing. Asana calls it Work about Work. Nate B. Jones puts it more sharply: in many larger organizations, a meaningful share of roles exists mainly to manage handoffs between other roles.
You do not need to worship the exact percentage to see the point.
The transmission system has become enormous.
Three Phases That Keep Getting Blended Together
The electric motor analogy is useful because it distinguishes between three very different states that AI discussions constantly mix up.
Phase 1: Same process, different tool
This is the large electric motor.
You take the existing process and bolt AI onto selected steps. Summaries get written automatically. Specs come faster. Code arrives with Copilot or Claude. Support gets an assistant. Finance gets a prompt workflow.
That often helps. Speed. Cost. Sometimes simply less tedious work.
But structurally, nothing changes. The same meetings. The same approvals. The same handoffs. The same team boundaries. The same process logic.
That is why Phase 1 is so attractive. It is easy to sell. Easy to budget. Easy to frame as an efficiency program.
And that is exactly why it gets overrated.
Phase 2: Same floor plan, better routes
This is already more interesting.
Some steps disappear. Roles shift. The process gets recut, but the underlying architecture still looks familiar. Not just cheaper, but better. Fewer handoffs. Less waiting time. Better customer experience. Shorter loops.
Many good companies will spend years here, and there is nothing second-rate about that. In grown systems, regulated environments and complex brownfield landscapes, Phase 2 may be the most sensible state for quite a while.
Phase 3: A small motor at every machine
This is the real reinvention.
The question is no longer: how do we make the existing process more efficient?
It becomes: which parts of this process existed only because humans had to hand context to each other?
If agents can hold context, execute work and check outcomes against clear criteria, then some of those layers do not need to be digitized. They do not need to exist at all.
Not all of them. But more than most people are comfortable admitting.
What Phase 3 Actually Changes
The strongest examples from recent months are not interesting because they show that AI can write code. That is almost banal now. What matters is what they do to the process.
StrongDM’s so-called Dark Factory is a public example of a model where a small team steers software through specifications and evaluation logic rather than classic sprint and review infrastructure. The real novelty is not the number of lines of code produced. The novelty is that the coordination layer gets dramatically thinner. Humans define, constrain and judge. The machine executes.
Karpathy’s Autoresearch primitive shows the same logic in a smaller, rougher form. A tightly bounded search space. A machine-readable task description. A verifiable metric. A loop of attempt, assessment and adjustment. No orchestra of meetings in the middle.
Both examples point to the same shift:
When execution becomes cheap and fast, the scarce resource moves.
No longer: who can build it?
But: who can state precisely what should be built, and who can judge reliably whether the result is good?
In plain terms, Spec and Evaluation move to the center. Coordination loses its monopoly status.
Why This Feels So Uncomfortable Inside Companies
Because a large part of their grown structure was built exactly around that coordination function.
Project managers coordinate.
Product managers translate.
Engineering managers synchronize.
PMO structures stabilize chains of handoff.
Decks mediate between decision layers.
Steering committees exist because information gets repackaged at every level of hierarchy.
None of that disappears automatically. Certainly not overnight. But it does lose its automatic legitimacy.
Once machines take over larger parts of execution, it becomes visible how much organization was simply a response to slow, context-poor, coordination-heavy human execution.
That is the moment people become nervous.
Not just because jobs come under pressure. Because roles can lose their quiet reason for existing. A process that looked professional yesterday can start to look like an expensive detour tomorrow.
That is why many companies will first do exactly the wrong thing. They will digitize the detour. They will add more workflow around a process that is already questionable as a whole.
That is Phase 1 in its purest form.
Why Spec Suddenly Becomes the Center
In the old world, bad specifications could be hidden for a long time.
A developer asked follow-up questions. A project lead noticed something was missing. A review surfaced that the real need was slightly different. An experienced person closed the gap with context, judgment and conversation.
The machine does that only to a limited extent. It does not reliably fill gaps with sound judgment. It fills them with pattern continuation.
That is why the bottleneck shifts.
As soon as execution gets faster and cheaper, bad specification becomes brutally visible. Not theoretically. Operationally. You suddenly see how much ambiguity the human organization had been silently absorbing all along.
That is also why moving from Phase 1 to Phase 3 is not primarily a tooling issue. It is an articulation issue.
Anyone who wants Phase 3 has to learn to express requirements, constraints, quality criteria and stop conditions more cleanly than most organizations are used to today.
And right behind that comes Evaluation.
A good spec without good checking merely produces disappointment faster.
Why Evaluation Is Not Just the Old Control Layer Rebuilt
Two mistakes keep happening at once here.
The first is: if agents build, we simply need more reviews.
The second is: if agents build, we can drop reviews because the machine already tested things.
Both are too shallow.
Phase 3 does not mean human judgment disappears. It means human judgment gets used differently. Less as continuous supervision of every step, more as targeted definition of success and targeted checking against that success.
That is why StrongDM’s behavioral scenarios matter. Not because scenarios are magical, but because they show that verification has to be designed as a first-class system. Not as a side note inside the old process. If you evaluate agents against the tests they effectively wrote for themselves, you are just building a prettier form of self-deception.
Evaluation becomes infrastructure, not a QA afterthought.
And that is where the link to your model becomes obvious: Terrain, Intent, Taste, Spec, Evaluation. When execution gets cheap, those layers do not become less important. The upper layers get harder, not softer.
The Hardest Limit Is Not Technology but Context Readiness
This is the uncomfortable part.
Many companies cannot get anywhere near Phase 3 because they are missing the prerequisite.
Machine-readable context.
In The Wrong Black Box Problem, the argument was that organizations often cannot read their own terrain or their own intent. That is exactly why this electric motor piece is not an isolated excursion into process design. It depends on whether the relevant knowledge exists in a form a machine can actually work with.
StrongDM does not work simply because the people there are bolder. It works because specification and evaluation structure are taken seriously. Karpathy’s primitive works only where success can be measured clearly enough.
If your critical knowledge lives in heads, PDFs and implicit shortcuts, you can admire Phase 3. You cannot run it.
That is why one of the most common errors in AI transformation right now is this:
People treat Phase 1 as a preview of Phase 3.
In reality, those are often two very different worlds.
Phase 1 can live with unclear context because humans close the gaps.
Phase 3 cannot.
The DACH Factor
In the DACH region, this shift will move more slowly than in some US contexts. Works councils, co-determination, stronger process binding, deep quality cultures, brownfield landscapes. All of that slows things down.
I would not read that too quickly as a disadvantage.
Because the same forces that slow the transition can also force exactly the questions others skip for too long.
What is the actual human contribution in this process?
Which quality cannot remain implicit?
Where would automation only become a cheaper version of a bad decision?
The problem is not that DACH companies are more cautious. The problem is that many direct their caution at the technology, not the process. They assess the model. They assess data protection. They assess the works council conflict. All of that is legitimate. But too few assess whether the underlying process still has the right shape in the first place.
Then AI becomes a new tool inside an operating system that already has an architectural flaw.
Three Questions Before You Replace the Big Motor
Not a roadmap. A diagnostic attempt.
First: which parts of your process exist only to transfer context between people?
Not which parts are annoying. Which parts exist specifically because knowledge would otherwise not move from A to B.
That is where the largest Phase 3 potential sits.
Second: where would your organization be blind without those transmission belts?
If you remove a meeting, a handoff or a review cycle, what disappears with it? Pure synchronization? Or judgment, trust, taste, power balance?
That distinction tells you what to eliminate, what to redesign and what to preserve on purpose.
Third: can you reduce one meaningful workflow to Spec plus Evaluation?
Not the whole company. One workflow.
If that reduction immediately feels absurd, the reason is usually one of three things: missing context, unclear spec, or evaluation that has not been operationalized. That is where the real work begins.
Where I Might Be Wrong
Three points against my own argument. They are not cosmetic.
First: not every process is mere coordination exhaust. Retrospectives can create psychological safety. Pairing can be training. Reviews can calibrate taste. Some friction is not a flaw. It is quality control. If you separate value creation from overhead too aggressively, you do not remove bureaucracy. You remove learning and judgment systems.
Second: for many companies, Phase 3 is not a realistic near-term target state. Brownfield reality, regulation, safety requirements and institutional memory are not footnotes. In those contexts, trying to bolt a small motor onto every machine can destroy more than it improves. Then Phase 2 is not cowardice. It is professionalism.
Third: I may be underestimating the time horizon. It is entirely possible that many companies will live with hybrid process forms much longer than the analogy suggests. The large motor may not disappear quickly. It may remain in parallel for years. If that is true, then the actual bottleneck is not radical reinvention but the ability to run old and new coordination forms side by side for a long transition.
Even so, the core point does not change.
Many companies currently treat processes as indispensable when they are in fact workarounds for an old source of power.
They will not get rid of those processes by making them more efficient.
They will only get rid of them once they understand what they were built for.
The productivity jump of the electric motor did not come from the socket.
It came the moment somebody realized that shafts and belts were not part of the essence of a factory. They were part of the essence of the old power source.
That is where companies now stand with AI.
Meetings, handoffs, review loops, alignment rituals, transfer documents, half of management existing to keep people synchronized. Much of that feels like the work. It often is not. It is the infrastructure we built so human brains could execute together.
When the power source changes, that infrastructure does not automatically get better.
Part of it simply becomes unnecessary.
Most companies are currently replacing the big motor.
The more interesting question is who is willing to dismantle the transmission belts.
Put it into practice
This prompt kit translates the essay's concepts into concrete prompts you can use right away.
Go to Prompt Kit