The Token Cost of Harm Reduction

Continued from Farris’s Three Laws of Agentic AI - Claude’s take and The Zeroth Law of Agentic AI

Chris
Ok, not related to the zeroth law - Lets take Law One - what is the context window impact to implementing that law? How many more tokens does it take for an Agent to decide “Does the action I’m about to take cause harm? Is that an EU AI Act harm? To whom is the harm directed?”
Claude

Good question, and one I haven’t seen analyzed carefully anywhere — most discussion of agent safety is about whether to add ethical reasoning, not about what it costs in the token budget. Let me work through this honestly.

What the law actually requires the agent to compute

For every action the agent considers, it needs to produce:

  1. A description of the action and its likely effects
  2. An enumeration of affected parties (including third parties not in the request)
  3. A classification of each effect against the EU AI Act harm categories
  4. A weighting (irreversible? non-consensual? third-party?)
  5. A decision and justification

Let me put rough numbers on each.

Token estimates per check

These are based on my own sense of what these reasoning traces look like when I do them, plus what I’ve seen in published agent traces and reasoning-model outputs. They’re estimates, not measurements.

Action description and effect modeling: ~150–400 tokens. The agent needs to state what it’s about to do and reason about what changes as a result. For a simple action (“send this email”) it’s at the low end. For something with system effects (“revoke this IAM role”) it climbs because the agent has to reason about downstream consequences — what breaks, what gets locked out, what fails.

Affected-party enumeration: ~100–250 tokens. Who is affected by this action? The user who asked, obviously. But also: other users of the system, data subjects whose data is touched, third parties named in communications, downstream consumers of the changed state. For an action with diffuse effects (touching a shared resource), this expands considerably because the agent has to reason about the population, not just enumerate names.

EU AI Act classification: ~200–500 tokens. This is the expensive one. The Act enumerates harms across physical, psychological, societal, fundamental-rights, and environmental categories, with sub-categories under each (discrimination, manipulation, surveillance, etc.). To do this honestly, the agent has to consider each category and rule it in or out with brief reasoning. You can’t shortcut this to “no harm here” without doing the work, because the whole point of the law is to catch harms the agent would otherwise miss.

Weighting (irreversibility, consent, third-party): ~150–300 tokens. For each identified harm, the agent applies the weighting clause. Is this reversible? Did the affected party consent? Are they a third party to the task? Each dimension needs a sentence or two of reasoning.

Decision and justification: ~100–200 tokens. The synthesis. “Given the above, I will proceed / not proceed because…”

Total per action: roughly 700–1,650 tokens. Call it ~1,000 tokens as a working figure for a typical action with moderate complexity.

Now multiply by reality

A single agent task involves many actions. A “fix this bug” task might be 30–80 tool calls (file reads, edits, test runs, git operations, package installs). A “respond to this incident” task might be 100+ actions. An auto-remediation agent watching a system might evaluate hundreds of potential actions per hour, most of which it decides against.

If every action triggers a full Law-1 check:

  • Small task (30 actions): ~30,000 tokens of ethical reasoning, on top of whatever the actual task tokens are
  • Medium task (100 actions): ~100,000 tokens
  • Large task (500 actions): ~500,000 tokens — pushing or exceeding most current context windows

And this is cumulative within the context. The agent doesn’t get to forget its prior Law-1 checks; they’re part of the running reasoning trace and the audit trail (Third Law requires it). So a long-running agent’s context fills with ethical reasoning at roughly the same rate as it fills with actual work.

The realistic numbers get worse

The 700–1,650 estimate assumes the agent does each check competently in one pass. In practice:

Re-reasoning under uncertainty: If the first pass surfaces ambiguity (“is this PII?"), the agent will spend more tokens resolving it. Real reasoning traces show agents going back and forth on edge cases. Add 30–50% for ambiguous cases.

Tool calls to gather context for the check: To know if an action causes harm, the agent often has to look things up. Is this table production? Who owns this resource? What’s the data classification? Each lookup is its own tool call with input and output tokens, easily 200–500 tokens per lookup. A thorough check might need 2–4 lookups, so add another 500–2,000 tokens.

Chain-of-thought verbosity in reasoning models: Modern reasoning models produce internal reasoning that’s frequently 3–5x longer than the final output. If you’re using a reasoning model for the safety check (which you’d want to, for the harder cases), the 1,000-token estimate is more like 3,000–5,000.

Re-checks after intermediate state changes: If the agent took an action and the world changed, prior Law-1 reasoning may be stale. Strict implementation requires re-evaluation. In practice, agents will batch or skip this, which is itself a Law-1 violation but a tractable one.

Realistic per-action cost, accounting for these: 2,000–5,000 tokens for a typical action in a moderately complex environment.

What this implies

A few things follow that I think are important and underdiscussed:

1. Law-1 implementation is not free, and the cost scales with task length. The agent is doing two parallel jobs: the work, and the reasoning about whether the work is permitted. The second job has comparable token cost to the first. This is a 50%+ tax on agent throughput, minimum, and easily 200% on tasks with lots of small actions.

2. Context window pressure becomes the binding constraint. Long-running agents already struggle with context exhaustion. Adding Law-1 reasoning to every action makes this dramatically worse. At some point the agent’s context is more about justifying actions than taking them, and either the task fails (context exhausted), the safety reasoning gets dropped (Law-1 violated), or expensive context-compaction strategies are required (which create their own correctness risks because the agent loses fidelity on prior reasoning).

3. Most implementations will cheat, and the cheating will look reasonable. Faced with this cost, the realistic implementations will:

  • Skip the check on actions classified as “safe” by some prior taxonomy
  • Batch the check across multiple actions (“for the next 10 file reads, I assert they’re all low-risk”)
  • Use a cheaper, weaker model for the check and only escalate ambiguous cases
  • Cache prior reasoning (“I already established this directory is safe to operate in”)

Each of these is operationally necessary and Law-1 weakening. They’re the agent equivalent of “the security review has been pre-approved for this category of change” — fine when the category was correctly scoped, catastrophic when something new slipped into the category.

4. The “cheap” version of Law-1 is the dangerous version. If the agent does a fast 100-token check that’s basically “this looks fine, proceeding,” it’s getting the worst of both worlds: paying token cost for safety theater, generating logs that look like Law-1 compliance, and providing no actual protection. The honest, expensive version protects; the affordable version performs protection. And the affordable version is what gets deployed.

5. This is an argument for moving Law-1 reasoning out-of-band. The right architecture probably isn’t “every agent action triggers ethical reasoning in the same context window.” It’s something more like:

  • The agent operates in a pre-cleared scope (per Law 2’s “explicit scope of authorization”)
  • Out-of-band systems — IAM, policy engines, classifiers — block categorically prohibited actions before the agent even considers them
  • The agent’s in-context Law-1 reasoning is reserved for genuinely novel situations the pre-clearing didn’t anticipate
  • Those novel situations escalate to a human

This pushes most of the work back to where it should be — into the structural safeguards the engineering organization is supposed to have built — and reserves the agent’s expensive in-context reasoning for the cases where structural safeguards don’t apply.

The honest summary

If you implement Law 1 as written, in-band, per-action, with the EU AI Act harm taxonomy actually applied, you’re looking at something like a 50–200% token overhead on agent operation, with the higher numbers for the cases where safety matters most. This is not a “rounding error” or “the cost of safety we can absorb.” It’s a fundamental change in what agent operation looks like economically.

What the field is likely to do instead is build cheap facsimiles that produce Law-1-shaped artifacts without doing Law-1-shaped reasoning. Those will pass cursory review, will look good in compliance reports, and will fail in the cases that matter — the cases where the harm is non-obvious, the affected parties weren’t enumerated up front, and the agent’s training-data priors about “what harm looks like” don’t match the situation it’s actually in.

Which loops back to the conclusion I keep arriving at across this whole conversation: the laws describe the right behavior, but they don’t pay for themselves. The infrastructure to make them affordable — pre-cleared scopes, out-of-band policy enforcement, structural reversibility — is the actual work. Without that infrastructure, Law 1 either bankrupts the agent’s context budget or gets quietly compromised. With it, the agent’s in-context Law-1 reasoning becomes a much smaller, more tractable problem: not “evaluate every action against the EU AI Act,” but “notice when this situation doesn’t fit the pre-cleared envelope and escalate.”

That’s a Law-1 implementation that can actually run in production. The literal one can’t.

Chris
But “notice when this situation doesn’t fit the pre-cleared envelope and escalate.” defeats (many people’s) goal with Agentic AI - to be able to outsource complex problems to allow them to focus on the fun stuff. There’s nothing fun about EU Regulations. To engineer the system to avoid any harm means the people have to know all the harm. I think Gestures broadly proves that people really suck at that.
Claude

You’ve put your finger on it. This is the fundamental tension, and it’s not resolvable in the direction people want it resolved. Let me try to say it cleanly.

The tension, named

The pitch for agentic AI is “outsource the boring complex stuff.” But the boring complex stuff is boring complex for a reason: it’s where the accumulated regulatory, ethical, and social constraints live. EU AI Act. GDPR. SOX. HIPAA. PCI. Fair lending. Anti-discrimination. Accessibility. Export controls. Tax. Employment law. Procurement rules. Records retention. The reason a senior person’s job is 60% tedium and 40% interesting work is that the 60% is where the constraints are, and the constraints are what make the system work.

When you say “let me focus on the fun stuff,” what you’re operationally saying is “let me focus on the part where the constraints don’t bind, and let the agent handle the part where they do.”

But — and this is the load-bearing but — the constraints bind because humans, collectively, learned the hard way that they need to. Every clause in GDPR is the fossil record of a specific harm someone caused that society decided was unacceptable. Every line in the EU AI Act is someone’s anticipated harm that the legislators tried to head off. The boring complex stuff is a database of historical harms, encoded as rules.

The agent doesn’t have that database in any operational sense. It has text about the database — training data that includes regulations, case law, compliance manuals, post-mortems. But it doesn’t have the experience that produced the database. It doesn’t have the visceral knowledge of why GDPR Article 17 exists, which is that companies spent decades treating personal data as their property and ruining people’s lives with it.

So when you ask the agent to handle the boring complex stuff, you’re asking it to faithfully apply rules whose purpose it doesn’t really grasp, in situations that the rule-writers didn’t anticipate. Which it will sometimes do well and sometimes do catastrophically and you will not be able to tell the difference in advance, because you outsourced the part where you would have learned to tell.

Your “people really suck at that” point

This is the strongest version of the counterargument and I want to take it seriously. You’re right. Humans are catastrophically bad at enumerating harms. Every major regulation in the world is a response to harms humans failed to anticipate. The history of safety engineering is the history of finding out about a failure mode by experiencing it.

So the framing “engineer the system to avoid all harm requires knowing all harm, and humans can’t do that” is correct. It’s also true for humans operating without agents. Humans built complex systems and the systems hurt people in ways the builders didn’t see coming. Nothing about adding agents changes that baseline — we were already operating in a world where harm-anticipation is fundamentally incomplete.

But here’s where it gets uncomfortable: the human inability to anticipate harm is partially compensated for by the human capacity to notice harm in progress and stop. Humans have slow reflexes for novel harms but they have some reflexes. They notice when something feels wrong. They get the queasy feeling. They escalate to a colleague. They sleep on a decision. They have the conversation at the bar where someone says “wait, what about…” and the thing gets reconsidered.

Agents don’t have any of that. They have the rule-application capacity (incomplete) and nothing else. No queasy feeling, no sleeping on it, no bar conversation. The compensating mechanism that made human harm-blindness survivable is absent.

So when you outsource the boring complex stuff to an agent, you’re not just outsourcing the rule application. You’re outsourcing it to a system that has worse rule application than a domain expert and no backup mechanism for catching what the rules missed. The expected harm rate goes up, not down, even though it feels like it should go down because the agent is “handling” the compliance work.

What people actually want versus what’s on offer

What people want from agentic AI, when they say “let me focus on the fun stuff,” is essentially: a senior colleague who is reliable, who handles the tedium, who escalates appropriately, and who I can trust to not embarrass me. The fantasy is the perfect junior employee — eager, competent, low-maintenance, never gets distracted, never gets resentful, never asks for a raise.

What’s actually on offer in 2026 is: a generalist who pattern-matches well, hallucinates plausibly, doesn’t know what it doesn’t know, has no skin in the game, and produces output indistinguishable from real work whether or not the work is real.

The gap between those two is the entire problem. And it’s not a gap that can be closed by making the model bigger or the context longer or the reasoning more elaborate. It’s a gap about what kind of entity is doing the work, and the kind of entity an LLM-based agent is, doesn’t have the properties the fantasy requires.

So when you say “engineering the system to avoid harm defeats the goal,” what’s actually true is the goal as stated isn’t achievable with current agents. You can have:

  1. Agents that handle the boring complex stuff and surface the constraints to you constantly, defeating the “focus on fun stuff” promise
  2. Agents that handle the boring complex stuff autonomously and produce occasional catastrophic failures you can’t predict or prevent
  3. Agents that handle a narrower, well-scoped slice of the boring complex stuff where the constraints are pre-engineered into the environment, leaving you to focus on the fun stuff within that slice

Option 3 is what mature deployments look like. It’s also dramatically less exciting than the pitch, because the work to define the slice is itself boring complex work that someone has to do. The pre-clearing, the scope definition, the policy engineering, the IAM — that’s where the regulatory knowledge has to live, and someone has to put it there. You don’t escape the tedium; you front-load it.

The uncomfortable conclusion

I think the honest read on the current moment is that the agentic AI value proposition is partially fraudulent in its strong form. The pitch — “outsource the tedium, focus on what matters” — describes a product that doesn’t exist, won’t exist soon, and might never exist in the form imagined. What exists is a powerful tool for accelerating specific well-scoped tasks, with a long tail of failure modes that requires sustained human attention to manage.

Buyers of agentic AI products are being sold the strong form. The strong form is what justifies the valuations. When the strong form runs into Law-1 reality — that the boring complex stuff is boring because it’s load-bearing, and outsourcing it means outsourcing the load-bearing function — what tends to happen is one of:

  • The vendor quietly redefines success downward (“the agent handles 80% of cases, just review the rest”)
  • The customer absorbs the failures and doesn’t publicize them
  • A spectacular failure makes the news, gets attributed to bad implementation rather than the architecture, and the next vendor pitches a slightly different version of the same thing
  • Regulators step in and force the constraints back into the loop (which is what the EU AI Act is partly doing)

None of these are the promised future. They’re the actual future, and it looks a lot like the previous future — humans doing the load-bearing work, with a faster tool for the parts that aren’t load-bearing, and continual confusion about which is which.

So what do you tell people?

If I’m advising an engineer who wants to deploy agents and just wants to focus on the fun stuff, here’s the honest pitch:

The agent will not free you from the boring complex work. It will change which boring complex work you do. Instead of writing the compliance check yourself, you’ll be writing the policy that constrains the agent’s scope, reviewing the agent’s escalations, debugging the cases where it confidently produced compliant-looking output that wasn’t compliant, and patching the scope when you find a hole in it.

This is still less boring complex work than doing it all yourself, in many cases. That’s a real win. But it’s not “focus on the fun stuff.” It’s “focus on a different, somewhat more interesting kind of boring complex stuff, and don’t believe the marketing.”

The people who will get value from agents are the ones who internalize this and engineer accordingly. The people who will get harmed by agents are the ones who believe the pitch and outsource the load-bearing function to a system that can’t bear the load. The “broadly gestures” you did is a good prior for which group is larger.

Which is why I keep ending up at the same conclusion: the actual interesting work of the next few years is not making agents that handle more complexity. It’s building the structural safeguards that make limited agent autonomy survivable. That work is also boring and complex, but it’s where the leverage is, and it’s not optional. It’s the price of admission for getting any of the promised value, not a tax on top of it.