Executive Briefing: More Output Is Not Productivity

AI can create in minutes what used to take teams hours. But someone still has to review it, make the call, and maintain the whole thing later. That is where apparent productivity turns into new work.

The most important question about AI productivity is not: how fast can the system create something?

The better question is: where does the work go afterwards?

That is the trap. AI makes the first half of many tasks faster: drafts, summaries, variants, code, research, ticket preparation. It feels productive because suddenly there is more on the table.

But the table is not the goal.

Someone has to check what is true. Someone has to decide what is actually useful. Someone has to find the polished mistakes. Someone has to maintain the result later. If that work stays the same or grows, productivity has not been gained. It has just been redistributed.

This is not an anti-AI argument. Quite the opposite. Precisely because AI is useful, the accounting has to get cleaner.

Stay up to date

Get notified when I publish something new, and unsubscribe at any time.

The bottleneck moves

Every process has a bottleneck, whether it is classic production or knowledge work. If you speed up one step, the bottleneck does not automatically disappear. It moves. Eliyahu M. Goldratt described that very clearly.

Before AI, the bottleneck was often creation. A first draft took time. An analysis took time. A variant took time. A pull request took time. A summary took time.

With AI, that step gets cheaper.

But the bottleneck does not vanish. It moves to review, judgement, and evaluation.

In daily work, that looks roughly like this:

The text is ready faster, but Legal takes just as long.
The code is written faster, but review and maintenance grow.
The analysis is produced faster, but nobody decides faster.
The agent handles subtasks, but someone still has to inspect logs, exceptions, and errors.
Campaign variants explode, but the brand does not become clearer by itself.

That is how the productivity trap emerges: the visible work gets faster. The decisive work stays stuck.

And because that decisive work is often less visible, it does not show up cleanly in the dashboard.

This is Goldratt again: any optimisation away from the bottleneck is wasted. Optimise before the bottleneck, and more work piles up there. That is mostly what we are seeing now. Optimise after the bottleneck, and the production step eventually runs empty because the bottleneck cannot supply it fast enough.

Why the studies seem to contradict each other

The research is not as clear-cut as many sales and product decks make it sound.

In a large NBER study, customer support agents became more productive with generative AI on average, especially less experienced workers. That makes sense. Support often has recurring patterns, good examples, and relatively clear feedback.

In a METR experiment with experienced open-source developers, the opposite happened. Developers took longer with early-2025 AI tools in their own code context. That also makes sense. Mature codebases are full of local decisions, implicit standards, and maintenance consequences.

Both results tell the same story.

AI helps when the task is observable enough and quality can be recognised quickly. AI slows work down or shifts it elsewhere when context, judgement, and maintainability are harder than the initial act of creation.

That is why the question “How much time does AI save?” is too crude.

The better question is:

Which work actually gets smaller, and which work just gets pushed somewhere else?

More output is a poor management metric

Many AI programmes measure what is easy to count: generated texts, prepared tickets, lines of code, agent runs, supposedly saved hours.

Understandable. Dangerous too.

Because those numbers usually measure the beginning of the work, not the end.

A company can produce more code and still slow down because review, testing, and maintenance cannot keep up. It can produce more content and communicate worse because more variants do not create more clarity. It can produce more analysis and make fewer decisions because every analysis creates new follow-up questions.

More movement is not direction.

Good productivity measurement has to look at the whole loop:

What was created?
What was actually used?
What had to be corrected?
What became more expensive later?
What did people learn or unlearn along the way?

Only then can you see whether AI reduces work or just makes the balance sheet look nicer.

The three hidden cost centres

The productivity trap usually appears in three places.

1. Review

AI creates plausible results. That is exactly the problem. Nonsense no longer looks like nonsense. It looks like a clean draft. Review gets harder. You are not just checking spelling. You are checking subject logic, context, omissions, and false confidence.

2. Maintenance

James Shore makes the hard point for code: an AI coding agent must not only reduce writing costs. It must reduce maintenance costs. If you produce twice as much code and do not lower maintenance cost per unit, you are building future burden.

The same applies to processes, texts, automations, and analysis. Everything that gets created has to be understood, maintained, or deliberately thrown away later.

3. Learning

Many tasks that now look like automatable busywork used to be learning ramps: research, first drafts, documentation, variants, ticket analysis, small customer cases. That was not always efficient. But it was where people learned what good work looks like.

If AI replaces those tasks without creating new learning spaces, you save time in the short term and damage judgement in the long term. Bad trade.

The pilot proves too little

Many AI pilots look good because the test is friendly.

The use case is clean. The data is limited. The users are motivated. The task is clear. The risk is small. Afterwards, the slide says: works.

Maybe.

But often a pilot only proves that AI can swim in the aquarium. It does not prove that it survives harbour traffic.

Before rollout, a few unfriendly questions belong on the table:

What happens with messy data?
Who notices subtle errors?
Which exceptions slow the system down?
What review load appears after three months?
Which task used to be a learning ramp for people?

Without these questions, the organisation optimises for demonstration. Not for operations.

In German companies especially, this is not a detail. Many processes already depend on grown systems, functional boundaries, and approval loops. If AI only speeds up creation, the extra load almost automatically lands where capacity was already scarce: expert review, IT, Legal, Compliance, leadership.

What leadership has to clarify

Leadership does not need to slow AI productivity down. It needs to prevent a faster generator from being mistaken for a productivity strategy.

That requires a simple operating calculation.

Output: What is created faster or more often?
Quality: What is actually usable?
Review: Who checks it, against which criteria, in what time?
Maintenance: What has to be maintained, repaired, or deleted later?
Learning: Who still builds judgement?

This is less elegant than a grand AI roadmap. It is closer to the work.

The most important management question is:

What additional review, maintenance, and learning capacity are we building in parallel to the new output capacity?

If there is no clear answer, the productivity slide is not a productivity strategy.

It is just a speedometer without a brake test.

The real test

Before the next AI productivity report, four sentences should be completed in writing:

This output number can go up without creating real value: …
This new review or maintenance load is created by AI: …
This learning ramp disappears if we automate the process: …
This person or role is allowed to say in the end: good enough, not good enough, stop: …

If those answers are missing, AI is not the problem.

The company is missing the operating system for AI work.

AI productivity does not emerge where more gets created. It emerges where a system can create more without damaging review, maintainability, and judgement.

Navigation

Executive Briefing: More Output Is Not Productivity

Stay up to date

The bottleneck moves

Why the studies seem to contradict each other

More output is a poor management metric

The three hidden cost centres

The pilot proves too little

What leadership has to clarify

The real test

Go deeper

Evaluation Is the New Leadership Work

Who Builds Your Judgment?

What Do You Actually Want?