The Wrong Black Box Problem

Last week, eight executives sat in a workshop somewhere in Germany. The question was simple: what context, what knowledge from the company does an agent need in order to contribute sensibly?

Four hours later, a list of 47 knowledge assets was hanging on the wall.

21 of them existed nowhere as documents
7 of them could not be explained cleanly by anyone in the room
And with 4 of them, the room went quiet because everyone knew the documented version was not the one the company actually lives by

That was not a curiosity. It was a pretty accurate diagnosis.

Stay up to date

Get notified when I publish something new, and unsubscribe at any time.

The first 21 are documentation debt. The next 7 are tacit skill. The last 4 are politics.

And that is exactly why so many AI projects go wrong. People work on the model, the prompts, the guardrails and the tools. But the real problem often sits inside the company itself. More precisely: in everything that is only half described, only implicitly understood, or only politically managed.

That is the skewed part of the debate. The machines are becoming more legible. Traces, specs, safety layers, audits: all still unfinished, but the direction is clear. And precisely now, many companies are noticing that they often know their own decision logic only as a plausible story.

Many AI projects do not fail first because the model is a black box. They fail because the company itself has no clean description of its own logic.

At the core, three different problems are being thrown into one bucket here: knowledge that was never documented, knowledge that is hard to put into language, and knowledge people do not want to write down openly because it immediately touches interests, power or convenience.

As long as people bridge those gaps, it feels like culture. The moment an agent is supposed to help, it becomes operations.

The Machine Becomes More Legible. The Organization Does Not.

When people talk about explainability today, the discussion almost automatically turns to the model. Reasoning chains. Traces. Safety layers. Evaluations. All important.

Ask those same questions of the organization.

Why was customer X prioritized?

Why did product idea Y never make it through budgeting?

Why does approval A officially take three weeks but in practice only two phone calls?

Why does department B reliably get resources even though the strategy on paper points elsewhere?

In most companies, you do not get an explanation to those questions. You get a plausible story.

That is not the same thing.

Chris Argyris gave us the clean distinction for this: there is the espoused theory, what an organization says about itself. And there is the theory-in-use, the pattern it actually follows. People can live surprisingly well with that gap. They read between the lines. They know the shortcuts. They know whom to call when the ticket system does not help.

An agent cannot do that.

It works on rules, documents, criteria and interfaces. On what has been made explicit. When the official description and the real behavior diverge, it operates on a version of the company that does not actually exist.

Then the agent suddenly looks stupid, even though it is only doing exactly what it was told to do.

The First 21: What Was Never Properly Written Down

Let us start with the mildest part.

21 of the 47 knowledge assets were neither secret nor especially hard to understand. They had simply never been captured cleanly.

That sounds banal. It is not. This is exactly where many companies lie to themselves. They say things like, “We have all of that in Confluence.” Or, “There must be a playbook for that somewhere.” Usually that only means documents exist. Not that the relevant context actually lives in them.

In the essay on machine-readable context, I described the Müller problem. The real pricing logic, the special cases, the shortcuts, the implicit approvals, the reason offer A works and offer B does not, all of that lives in the heads of people who have been around long enough. As long as those people are reachable, the system feels robust. It is not.

Moving knowledge from heads into documents is unpleasant, but not mysterious.

Conduct interviews. Reconstruct decisions. Pull artifacts together. Resolve contradictions. Clean up versions.

That is not an epistemic crisis. It is hard work. And it is classic change management, because you have to take away people’s fear that they will be replaced once the knowledge from their heads is documented out in the world.

If you already recoil at that stage, you are not ready for serious agent systems. And, quite honestly, you never will be. Not because the technology is missing, but because your organization still treats its own knowledge as an accidental by-product.

The Next 7: What People Can Do but Cannot Explain Cleanly

Then it gets more interesting.

7 knowledge assets could not be fully explained by anyone in the room. Not because the people involved were clueless. Quite the opposite. You could tell that everyone roughly knew what was going on. Just not in a form that could be cleanly put into sentences.

Michael Polanyi’s line is overused, but the core is right: we know more than we can tell.

A meaningful part of practical knowledge does not exist as an explicit rule. It shows up in judgments, comparisons, routines and developed skill. The experienced buyer notices from two side comments that a supplier is bluffing. The service lead hears from one phrase that a customer is not asking for help but preparing an escalation. The production planner can already see next month’s problem in what looks like a normal request.

That knowledge is real. It has economic value. And it can often be translated into rules only to a limited degree.

That is exactly why many AI initiatives do not fail at prompting first but at a simpler category error. People act as if all they had to do were pull existing knowledge out of systems and write it down cleanly. In reality, some of that knowledge is not stored as language at all. It lives in comparisons, routines, relationship patterns and subtle signals.

That does not mean you should give up. It only means you have to name the problem more precisely.

The task is not: write everything down. The task is: separate what can be formalized, what has to be trained, and what should remain human judgment.

That is more demanding. And it matters especially for DACH companies, because many of their competitive advantages live in exactly these compressed bundles of capability. Not in the big strategy formula, but in the quiet precision with which an organization handles complicated situations reliably.

And this is exactly the point that can give employees confidence that they are not about to be out on the street in a few months. On the contrary: they will be needed more than ever for what is genuinely human: the ability to make decisions based on their own implicit experience.

The Last 4: What Gets Told Officially in a Different Way Than It Actually Runs

And then there are the four quiet knowledge assets.

These are the moments when people laugh briefly, look away, or begin the sentence with “well”.

At this point, it is no longer just about something being hard to put into language. It is about people not wanting to write it down cleanly because the open version would immediately carry social costs.

The official strategy may say that all customers are prioritized by profitability. In reality, two large accounts get special treatment because the board maintains personal relationships there.

Officially, projects are evaluated by business case. In reality, one project survives because it belongs to a person who is too powerful internally to let the topic die.

Officially, an AI initiative is supposed to standardize processes. In reality, three department heads defend their special logic because that special logic protects power, budget or irreplaceability.

That is not the dirty exception in an otherwise rational organization. That is normal organization.

The official justification serves a social function. It saves face, keeps relationships intact, and keeps power from becoming too visible.

That is exactly why AI transformation runs into honesty here, not technology. If you want to teach an agent how decisions are really made, you have to expose how decisions are really made. And that is exactly what many companies do not want. Sometimes not even in front of themselves.

At this point, context engineering is not just knowledge work. It is power work.

When the Agent Believes the Official Version

You could always treat those contradictions as a cultural quirk. A bit of informal organization here, a bit of political drag there. Annoying, but normal.

With AI, it becomes more expensive.

Because the agent does not distinguish between the official version and the real one when you only give it the official version. It faithfully executes what has been made explicit.

Then three things happen.

First: the agent makes formally correct but practically wrong decisions. It follows the written prioritization logic and still ends up beside reality.

Second: the company diagnoses the mistake in the wrong place. Instead of saying, “our description of the organization was wrong,” it says, “the agent does not understand our business.”

Third: the informal organization does not disappear. It simply grows an additional layer of workarounds. People build the agent in, then route around it again and create new shadow work around the promised standard system.

That is one reason why so many early AI implementations sound impressive and still collapse into gray extra work in day-to-day practice. The organization tried to automate its official fiction. Reality then had to find its way back in.

One Knowledge Hoarder Is Rarely the Real Problem

It is convenient to pin the problem on a person. The department head who keeps everything in his head. The team with the Excel hell. The one manager without whom nothing supposedly works.

But that is rarely the whole diagnosis.

The real problem is usually a system that rewards knowledge asymmetry. The area with its own Excel hell is not just chaotic. It often protects an advantage in power. The diffuse process landscape is not simply historically grown. It sometimes keeps responsibilities blurry on purpose.

That is why naive transparency slogans fail so often.

“We just need to document everything” sounds reasonable. But the moment documentation touches influence, negotiating power or protected spaces, resistance shows up. Not always openly. Often dressed up as a reasonable concern.

“This case is too complex.”

“You cannot standardize that.”

“We would lose important nuance.”

Sometimes that is even true. That is exactly why the topic is unpleasant. Distinguishing real complexity from tactical opacity is leadership work. Not tool selection.

In DACH, Tacit Skill and Self-Deception Sit Close Together

In German debates, this situation often collapses into two bad stories.

One is Silicon Valley romanticism. Everything informal is treated as dysfunctional. The answer is supposed to be total transparency, total measurement, total standardization.

The other is culture romanticism. Everything informal is treated as valuable experiential knowledge. The answer is supposed to be to leave the organization alone and just introduce a few good tools.

Both are too simple.

Especially in the DACH region, there are many companies in which tacit, compressed experiential knowledge really is a competitive advantage. Mittelstand, industry, B2B sales, regulated environments. A great deal of value lives there in nuance, in long-trained judgment, in informal coordination patterns that do not show up on any org chart.

But the same environment also produces robust forms of self-deception. People confuse historically grown practice with necessity. They confuse power compromises with sensible processes. They confuse lack of clarity with professional complexity.

In regulated environments there is another layer on top. Anyone who wants to explain to a works council, a data protection officer or an auditor why an agent is allowed to decide what it is supposed to decide has to make more explicit than before. That slows things down. But it also disciplines them. In this respect, the DACH region is not only slower. If it takes this seriously, it can also become more honest.

The real distinction therefore is:

What of this is condensed intelligence?

What of this is accumulated convenience?

What of this is protection against real risk?

And what of this is merely protection against visibility?

Without that distinction, you either build agents on sand. Or you quietly destroy the very informal structures that actually make you better.

Both are expensive.

Three Questions Before the First Agent Project

Not a checklist. More three questions that tell you whether the problem has already become operational in your organization.

First: where does the relevant knowledge actually live?

In heads? In PDFs? In tickets? In quiet relationship patterns? Or already in a form a machine can work with?

If the honest answer is mostly heads, meetings and gut feel, that is not an argument against AI. It is an argument against overconfidence.

Second: what is genuinely hard to say, and what do people simply not want to say openly?

That is the core distinction in this text. From the outside, both can feel similar: something remains blurry. But the reasons are different. One requires patience, observation and better description. The other requires courage, conflict capacity and sometimes a power decision.

Confuse the two, and you choose the wrong intervention.

Third: what happens if the agent follows the official version of your organization exactly?

Not the lived version. The official one.

If the answer makes you uneasy, you have your diagnosis.

Honesty Check: Where I May Be Oversimplifying

Three objections to my own argument. I think they are real.

First: not every company needs this depth. Anyone introducing a simple service chatbot does not need to launch an organizational deep dive immediately. There are enough AI projects where clean processes, usable data and a narrow scope are enough. If I apply this diagnosis equally strongly to every automation effort, I stretch the argument too far.

Second: not everything informal is a problem. Some quiet support lines, some situational judgments, some relationships lose quality when you press them too early into rigid rules. There are informal structures that do not hide opacity but make complexity manageable. If you try to expose everything, you can damage exactly the intelligence you were trying to preserve.

Third: I may be overemphasizing the political side. Not every ambiguity is a power game. Some of it is simply historically grown, poorly documented or genuinely hard to formalize. If you infer interests too quickly, diagnosis turns into suspicion. Then you start looking for tactics everywhere and overlook the more banal truth: nobody has done the work of describing the matter properly.

Even so, the core point stands.

The real question is not whether something runs formally or informally. The real question is: which forms of informality carry value, and which merely keep contradictions invisible?

The hardest part of context engineering is not making knowledge machine-readable.

The hardest part is honestly looking at what kind of knowledge this actually is.

47 knowledge assets.

21 undocumented.

7 not cleanly articulable.

4 unspeakable.

The first 21 are work.

The next 7 are humility.

The last 4 are courage.

The machine becomes more legible every quarter.

Your organization does not necessarily.

And the reason is often not technical complexity.

The reason is that many companies still have not decided how much truth their own organization can tolerate.

Navigation