The Wrong Black Box Problem
Last week I sat in a workshop with eight executives. The brief was simple: what institutional knowledge does an agent need in order to work sensibly inside your company?
Four hours later, there were 47 knowledge assets on the wall.
31 of them existed nowhere as a document.
12 of them could not be fully explained by anyone in the room.
And with 4 of them, the room went quiet because everybody knew the official version was not the real one.
I have not been able to stop thinking about that distribution since. Not because of the number. Because of the structure.
The first 31 are labor. The knowledge exists, it just is not written down properly anywhere.
The next 12 are harder. That is where you run into knowledge that works perfectly well in practice but does not compress neatly into words.
And the last 4 are the real issue. Not because they are especially complex. Because there are reasons not to write them down openly.
That is exactly where the black box discussion starts to flip.
Many companies still talk as if the central problem sits inside the machine. Can we trust the model? Can we reconstruct its reasoning? Can we explain why the agent reached a given conclusion?
Those are legitimate questions. But the problem is shifting.
The machine is getting more explainable. The organization is not.
Anthropic can now show much more clearly which internal patterns activate inside models. Model Specs make normative boundaries more explicit. Over the next few years, auditors will know far more about the inside of high-risk systems than they do today. At exactly the moment AI becomes more technically legible, many companies are discovering that they cannot read their own decision logic.
So the uncomfortable question is no longer just: is AI a black box?
It is: why is your organization still one?
The Asymmetry
When people talk about explainability today, the discussion almost always points at the model. Reasoning chains. Traces. Safety layers. Evaluations. All of that matters.
Now ask the same questions of the organization.
Why was customer X prioritized?
Why did product idea Y never get budget?
Why does approval A take three weeks formally and two phone calls informally?
Why does business unit B keep getting resources even though the official priorities point somewhere else?
In most companies, you do not get an explanation to those questions. You get a plausible story.
That is not the same thing.
Chris Argyris drew the distinction decades ago: there is the espoused theory, what an organization says it does, and there is the theory-in-use, the pattern it actually follows. That gap becomes expensive in AI projects because it is no longer just culturally annoying. It becomes operational.
As long as humans hold the system together, you can live with that gap. People read between the lines. They know the shortcuts. They know which spreadsheet matters more than the official dashboard and who to call when the ticket system gets in the way.
Once an agent is meant to participate, that is no longer enough.
An agent works on what is explicit. Rules, documents, interfaces, criteria. If the official description and the real behavior diverge, the machine optimizes for the fiction.
Then the agent suddenly looks stupid, even though it is simply being obedient.
The First 31: What Was Simply Never Written Down
Start with the mildest part.
31 of the 47 knowledge assets from that workshop were undocumented. Not secret. Not mysterious. Just never written down.
That sounds banal. It is not, because this is exactly where many companies lie to themselves. They say, "We have all of that in Confluence." Or, "There must be a playbook for that somewhere." Usually that just means there are documents. Not that the relevant context lives in them.
In the essay on machine-readable context, I described what I call the Mueller problem. The special terms, the real pricing logic, the reason offer A works and offer B does not, all of it lives in the heads of the people who have been around long enough. As long as those people are reachable, the system appears robust. It is not.
Moving knowledge from heads into documents is unpleasant, but relatively clear. Interview people. reconstruct decisions. consolidate artifacts. resolve contradictions. The work is tedious. Politically, it is often still manageable.
That is the first 31.
If that stage already feels irritating, your company is not ready for serious agent systems yet. Not because the technology is missing. Because the organization still treats its own knowledge as an accidental by-product.
The Next 12: What the Organization Knows but Cannot Say Cleanly
Then it gets more interesting.
12 of the knowledge assets in that room could not be fully explained by anyone there. Not because the people were clueless. Quite the opposite. You could feel that they knew what they were talking about. Just not in a form that would fit into three precise sentences.
Michael Polanyi gave us the line everybody quotes: we know more than we can tell.
It gets abused as a soft management slogan. It means something harder than that. A meaningful part of practical knowledge does not exist as an explicit rule. It shows up in judgment, in fine distinctions, in a developed ability to read situations correctly. The experienced buyer hears from two side comments that a supplier is bluffing. The service lead can tell from one phrase that a customer is not really reporting a problem but preparing an escalation. The production planner sees follow-on trouble next month in what looks like a normal request today.
That knowledge is real. It has economic value. And it often resists being translated cleanly into rules.
That is why many AI initiatives do not fail at prompting first. They fail at an epistemic underestimation. People act as if the task were simply to extract the knowledge already in the system and write it down cleanly. In reality, part of that knowledge does not live as language in the first place. It lives in comparisons, routines, relationship patterns and subtle signals.
That does not mean you give up. It means you name the problem correctly.
The task is not: write all knowledge down.
The task is: figure out which knowledge can be formalized, which has to be trained, and which should remain human judgment for now.
That is a different standard. More mature. More honest. And especially relevant for DACH companies, because many of their real advantages live in exactly these compressed bundles of capability. Not in the strategy deck, but in the quiet precision with which an organization handles complex situations.
Ignore that layer, and you will end up building agents that look competent on paper and fail on the fine distinctions that matter in practice.
The Last 4: What Must Not Be Said Out Loud
And then you get to the four quiet items.
These are the moments when people laugh briefly, look away, or begin the sentence with "well".
Not because they do not know. Because saying it plainly would have social costs.
This is no longer terrain. This is intent.
The official strategy may say all customers are prioritized by profitability. In practice, two large accounts get special treatment because senior leadership has personal relationships there.
Officially, projects are judged on business cases. In practice, one project survives because it belongs to a person who is too powerful internally to let the topic die.
Officially, the new AI initiative is meant to standardize processes. In practice, three business unit heads defend their local exception logic because that logic protects power, budget, or irreplaceability.
That is not the dark side of an otherwise rational organization. That is normal organization.
Rory Sutherland has made the point that the reasons people give for decisions are often not the real reasons, but the stated reasons still serve a social function. That is exactly what happens here. The official explanation saves face, preserves relationships and keeps power from becoming fully visible.
At this point, AI transformation does not collide with missing technology. It collides with organizational honesty. If you want to teach an agent how decisions really get made, you have to expose how decisions really get made. And that is exactly what many companies do not want to do. Not even in front of themselves.
That is why context engineering is not just knowledge work. It is power work.
Why AI Makes This Expensive
You could always dismiss these contradictions as a cultural quirk. A bit of informal organization here, a bit of political drag there. Annoying, but normal.
AI changes the cost of that.
An agent does not distinguish between the official version and the real one if you only give it the official version. It will faithfully execute what has been made explicit.
That creates three predictable outcomes.
First: the agent makes formally correct, practically wrong decisions. It follows the written prioritization logic and still lands beside reality.
Second: the company diagnoses the failure in the wrong place. Instead of saying, "our description of the organization was wrong," it says, "the agent does not understand our business."
Third: the informal organization does not disappear. It hardens. People start building workarounds around the agent. The promised standard becomes one more layer of gray shadow work.
That is one reason so many early AI implementations sound impressive and then degrade so quickly in day-to-day use. The company tried to automate its official fiction. Reality had to claw its way back in afterwards.
So the mistake is not that AI itself is necessarily the black box.
The mistake is that many companies confuse their own black box with a document repository.
The Therapy Metaphor Helps, Up to a Point
There is a useful systems-therapy analogy here, as long as you do not push it too far.
In dysfunctional families, the identified patient is often not the actual problem. The visible symptom carries a conflict the whole system both produces and stabilizes.
Organizations work in a similar way. The "knowledge-hoarding manager" is rarely the whole issue. The deeper problem is a system that rewards information asymmetry. The team with the private Excel hell is not just messy. It is often protecting an advantage. The diffuse process landscape is not merely historical drift. It often keeps accountability intentionally blurry.
That is why naive transparency slogans fail so often.
"We just need to document everything" sounds reasonable. But once documentation starts touching influence, negotiation power or protected spaces, resistance appears. Not always openly. Often in the form of perfectly reasonable concerns:
"This case is too complex."
"You cannot standardize that."
"We would lose important nuance."
Sometimes that is even true. That is exactly why the problem is uncomfortable. Distinguishing real complexity from tactical opacity is the actual leadership work.
Why This Matters Especially in DACH
In German-speaking business contexts, this tension usually gets flattened in two equally bad ways.
One is Silicon Valley romanticism. Everything informal is treated as dysfunction. The answer is total transparency, total measurement, total standardization.
The other is culture romanticism. Everything informal is treated as valuable craft knowledge. The answer is to leave the organization alone and bolt on a few useful tools.
Both are too easy.
In the DACH region, there are many companies where compressed, informal experience really is a competitive advantage. Mittelstand, industry, B2B sales, regulated environments. A lot of value lives there in nuance, in trained judgment, in informal coordination patterns invisible on the org chart.
But the same environment also produces robust forms of self-deception. People confuse historically grown practice with necessity. They confuse power compromises with sensible process. They confuse lack of clarity with professional complexity.
Anyone serious about AI transformation in that setting needs a harder distinction:
What here is condensed intelligence?
What here is accumulated convenience?
What here is protection against real risk?
And what here is just protection against visibility?
Without that distinction, you either build agents on sand or quietly destroy the few informal structures that genuinely make you better.
Both are expensive.
Three Questions Before Any Serious Agent Project
Not a checklist. More of a sobriety test.
First: where does the relevant knowledge actually live?
In heads? In PDFs? In tickets? In quiet relationship patterns? Or already in a form a machine can work with?
If the answer is mostly heads, meetings and gut feel, that is not an argument against AI. It is an argument against self-deception.
Second: what is genuinely hard to articulate, and what is simply not meant to be articulated?
That is the core distinction in this essay. Terrain and intent often feel similar from the outside. In both cases, something remains blurry. But the reasons are different. One calls for patience, observation and better forms of formalization. The other calls for courage, conflict capacity and sometimes power decisions.
Confuse the two, and you pick the wrong intervention.
Third: what would happen if the agent worked exactly according to the official version of your organization?
Not the real one. The official one.
If that thought makes you uneasy, you have your diagnosis.
Where I Might Be Wrong
Three objections to my own argument. I think they are real.
First: not every company needs this level of depth. If you are introducing a simple service chatbot, you do not need a full organizational excavation. There are plenty of AI projects where clean processes, usable data and a narrow scope are enough. If I apply the black box diagnosis equally to every automation effort, I am stretching the argument too far.
Second: not everything informal is a problem. Some relationships, some quiet support lines, some forms of situational judgment lose quality when you force them too early into rigid rules. There are informal structures that do not hide opacity but make complexity manageable. If you try to expose all of it, you can damage exactly the intelligence you were trying to preserve.
Third: I may be overemphasizing the political layer. Not every ambiguity is a power game. Some of it is simply historical drift, poor documentation or knowledge that is genuinely hard to formalize. If you infer intent too quickly, diagnosis turns into suspicion. Then you start looking for hidden interests everywhere and miss the more banal truth: nobody did the hard work of describing the terrain properly.
Even so, the core point remains.
The relevant question is not: formal or informal?
The relevant question is: which form of informality carries value, and which one merely keeps contradictions out of sight?
The hardest part of context engineering is not making knowledge machine-readable.
The hardest part is admitting what that knowledge actually is.
47 knowledge assets.
31 undocumented.
12 not cleanly articulable.
4 unspeakable.
The first 31 are work.
The next 12 require humility.
The last 4 require courage.
The machine gets more explainable every quarter.
Your company does not.
And the reason is not technical complexity first.
The reason is that you still have not decided how much truth your organization can tolerate.
Put it into practice
This prompt kit translates the essay's concepts into concrete prompts you can use right away.
Go to Prompt Kit