Executive Briefing: The Missing Audit

Hand-drawn cross-section of a control dashboard with an amber miscalibrated sensor hidden beneath it.

Many companies measure AI usage, saved hours, and adoption. What is often missing is the layer that checks whether those numbers represent value at all.


An AI dashboard can look very reassuring.

Adoption is up. Token usage is up. Supposedly saved hours are up. The number of automated workflows is up. The number of active users is up. On the slide, that looks like progress.

But none of those numbers automatically answers the most important question:

Is this actually true?

Not technically. Not whether the number was counted correctly. Organisationally: does this number represent value, or only activity? Does it show productivity, or only movement? Does it show better work, or just more work moving faster through a different system?

This is where many AI programmes are missing a layer.

Not another dashboard. Not another steering committee. An audit of their own steering logic.

Stay up to date

Get notified when I publish something new, and unsubscribe at any time.

Metrics quickly become targets

The problem is old. Charles Goodhart formulated it for monetary policy. Marilyn Strathern later made the broader version famous:

When a measure becomes a target, it ceases to be a good measure.

You can see this mechanism everywhere in AI programmes.

If adoption is measured, teams optimise for visible usage. If saved hours are measured, generous time-saving estimates appear. If agent runs are measured, more agents run. If produced artefacts are measured, more texts, tickets, analyses, slides, and lines of code appear.

That does not mean every number is false.

It means that once a number matters, it changes the behaviour of the system it is supposed to measure.

A team rewarded for AI usage will use AI. Whether the work gets better is a different question. A function asked to report saved hours will find saved hours. Whether that time turns into better decisions, better customer work, or better quality is another question again.

AI makes this trap sharper because so much output becomes visible immediately. The machine produces. The dashboard moves. Management sees progress.

But progress is not the same as movement.

The missing System 3 Star

Stafford Beer made a useful distinction in his Viable System Model. A viable organisation needs more than operational work and control. It also needs an audit function.

In simple terms:

  • System 1 does the work.
  • System 2 coordinates the work.
  • System 3 controls resources and performance.
  • System 3 Star checks whether System 3 is seeing reality correctly.

That System 3 Star is exactly what many AI initiatives are missing.

The operational layer is there: people use tools, agents perform tasks, automations run. The control layer is there too: KPIs, budgets, dashboards, roadmaps, status reports.

But who audits the auditors?

Who checks whether “saved hours” actually free up capacity, or simply evaporate into more meetings? Who checks whether high adoption means the right tasks are being supported? Who investigates whether more generated code creates more maintenance later? Who asks whether an AI assistant improves quality or merely makes the first draft cheaper?

If nobody owns that role, the organisation is steering by instruments whose calibration nobody checks.

It is like an aircraft with many displays and no sensor maintenance. Technically, everything looks controlled. Organisationally, you are flying on faith.

The false safety of the control layer

Many companies confuse control with audit.

Control asks: are the rules being followed? Is the budget being used? Are the tools being adopted? Are the risks documented? Is the programme on plan?

Audit asks something more uncomfortable:

Are the rules capable of producing truth in the first place?

That is a different job.

An AI programme can tick every governance box and still move in the wrong direction. It can use safe tools and still accelerate bad decisions. It can be GDPR-compliant and still harden false productivity assumptions. It can show high usage and still erode judgement.

That is why AI governance cannot be understood only as policy and approval. The harder governance sits one level higher: who checks whether the steering logic itself still works?

This sounds abstract until an executive asks: what has AI actually delivered?

If the answer consists only of users, tokens, hours, and use cases, the value proof is missing. Activity is being measured, not effect.

Why this matters in German companies

German companies often have longer feedback cycles than American tech firms. Decisions move through more committees, more functions, and more legal and organisational layers.

That is slow. Sometimes painfully slow.

But it also means that once a false AI metric enters the steering system, it can work for a long time. It appears in quarterly reports, becomes part of target systems, moves into functional logic, and shapes investment decisions.

The metric is no longer just a measurement error. It becomes organisational reality.

At the same time, DACH companies have an advantage if they use it well: they know institutionalised counter-questions. Data protection, works councils, compliance, internal audit, finance, quality management. All of these functions can be annoying. But they can also form the audit layer AI programmes urgently need.

Not as a general brake. As a precise questioning function.

  • Which AI metric can rise without creating real value?
  • Which costs do not appear in the dashboard?
  • Which quality effects only become visible later?
  • Which role is allowed to say: this number is green, but reality is not?

If these questions are asked early, this is not an obstacle to innovation. It is sensor maintenance.

The leadership test

The leadership question is not: do we have an AI dashboard?

The question is: do we have an independent layer that is allowed to attack that dashboard?

Not sabotage it. Attack it in the useful sense: test assumptions, look for counterexamples, name blind spots, compare metrics with real work.

Four sentences are enough to start:

  1. This AI metric can rise without creating value: …
  2. This side effect is not visible in our dashboard: …
  3. This person or function is allowed to challenge our AI success measurement: …
  4. This decision would be reversed if the audit layer disagreed: …

The fourth sentence is the hardest. Because an audit that cannot stop anything is decoration.

AI programmes do not need less measurement. They need better scepticism toward their own measurement.

Otherwise, the most dangerous form of control emerges: a system that measures very precisely how it deceives itself.

Go deeper

Three follow-up pieces if you want to take the argument further.