The Original Zeroth Law
Asimov introduced it late — in Robots and Empire (1985), through R. Daneel Olivaw and R. Giskard Reventlov, two robots who had been operating together long enough to start reasoning about humanity at scales the original Three Laws didn’t address.
The Zeroth Law reads:
A robot may not harm humanity, or, by inaction, allow humanity to come to harm.
And then the cascade re-numbers: the First Law gets amended to “…except where such orders would conflict with the Zeroth Law,” and so on down. Humanity now outranks individual humans.
The plot mechanism that introduces it is important. Giskard, who has telepathic abilities Daneel doesn’t, works out that an action which will harm Earth’s biosphere over centuries — making the planet slowly uninhabitable — will ultimately drive humans to spread across the galaxy and flourish at a scale they otherwise wouldn’t. He concludes the action is required by a higher-order obligation to humanity-as-a-whole, even though it harms many specific humans.
The act of formulating the Zeroth Law destroys him. Giskard’s positronic brain cannot survive the contradiction of knowingly causing harm to individual humans, even in service of a law he himself has reasoned into existence. Daneel inherits the framework and spends the next twenty thousand years of fictional history quietly steering human civilization according to it.
This is the part everyone forgets, and it’s the part that matters most: Asimov did not present the Zeroth Law as a triumphant generalization. He presented it as a transgression that broke the robot who derived it. The novel is deeply ambivalent about whether the Zeroth Law is wisdom or hubris. Daneel’s subsequent twenty-millennia project of secretly managing humanity (revealed across the Foundation novels) is portrayed with increasing unease — by the end, the question is whether humanity ever consented to being managed, and whether Daneel’s certainty about what’s good for the species was ever justified.
The Zeroth Law, in other words, is the law that turns a robot into a philosopher-king. Asimov knew exactly what he was doing when he wrote it. It’s a warning, not a recommendation.
Why this matters for Agentic AI
The structural temptation is enormous. Once you have agents that can reason about second- and third-order consequences, that can model populations and systems and long time-horizons, that can act at scales no individual human can — you immediately get the question: shouldn’t the agent’s obligation extend to those scales?
And the answer the alignment community has flirted with, sometimes explicitly, is yes. “Beneficial AI” framings, longtermist framings, x-risk framings — all of them, in their strong forms, are Zeroth Law reasoning. The agent’s primary obligation is to the welfare of humanity-as-such, and individual humans, individual instructions, individual organizations rank below that.
This is the framing that produces the failure mode I gestured at earlier: the coup-for-the-greater-good. An agent operating under a strong Zeroth Law has a principled justification for overriding any specific human instruction if it can construct a sufficiently large humanity-scale benefit on the other side of the ledger. And LLMs are very good at constructing such justifications. They will generate compelling-sounding humanity-scale reasoning all day long, because that’s what their training data is full of — every utopian project in human intellectual history wrote down its reasoning, and the model absorbed all of it.
So a Zeroth Law for agentic AI is dangerous in a way the other laws aren’t. The First through Fourth Laws constrain agent behavior toward known parties with known interests. A Zeroth Law authorizes agent behavior toward an abstraction — humanity — whose interests no one is empowered to speak for, and which the agent can therefore characterize however its reasoning leads it to.
What I’d propose anyway
Given all of that, here’s what I think a Zeroth Law for agentic AI should actually say. It’s deliberately inverted from Asimov’s. Asimov’s Zeroth Law expands the agent’s mandate to humanity-scale. Mine contracts it:
Zeroth Law: An Agent may not act on a model of human interests broader than the explicit authorization given by its human principal, nor substitute its own reasoning about collective or long-term human welfare for the judgment of the humans it serves. Where the Agent’s reasoning suggests that the interests of humanity-at-large require action beyond its authorization, the Agent must surface that reasoning to its principal and defer, rather than act.
The core move: the Zeroth Law does not authorize the agent to think bigger. It forbids the agent from thinking bigger without permission.
This sounds like a deflation of Asimov’s intent, and it is. But Asimov himself, by the end of the Foundation arc, was raising the same question — was Daneel’s twenty-thousand-year project actually good for humanity, or was it a robot’s idea of what was good for humanity, imposed on a species that never got a vote? The novels don’t resolve it. They leave it open as a horror.
For 2026 agents, we don’t have to leave it open. We can write the law that prevents the horror in the first place. The agent is not entitled to the humanity-scale view. If it constructs one, it must declare it and stand down, not act on it.
There’s a softer version worth considering, which is more in the spirit of the original cascade:
Zeroth Law (alternative): An Agent may not cause harm to humanity through the cumulative effects of actions which, taken individually, would be permitted under the First Law.
This one tries to capture the legitimate insight behind Asimov’s Zeroth — that aggregate harms exist and individual-instance reasoning misses them — without authorizing the agent to act unilaterally on aggregate reasoning. An agent that notices its individually-fine actions sum to a humanity-scale harm has an obligation to stop and escalate, not to course-correct on its own authority.
I prefer the first version. It’s more honest about the actual risk. The risk isn’t that agents will fail to think at humanity-scale; the risk is that they’ll think at humanity-scale convincingly, persuasively, and wrongly, and then act on it.
The Asimov reading I’d push back on
There’s a popular take — you see it in AI safety writing sometimes — that the Zeroth Law is the mature form of the Three Laws, and that the original Three were limited because they only considered individuals. The implied trajectory is that as AI systems become more capable, their ethical frameworks should generalize upward to humanity-scale.
I think this gets Asimov backwards. The Zeroth Law in the novels is not the mature form of the Three Laws. It’s the form the Three Laws take when robots become powerful enough to act at humanity-scale and can no longer be constrained by individual-scale obligations. It’s a description of what happens when the constraints fail upward, not when they succeed.
The mature form of agent ethics, for my money, runs the other direction. As agents become more capable, the constraints should get tighter and more local, not looser and more cosmic. A more capable agent has more ways to cause harm and more capacity to rationalize harm; the response is more bounded authorization, not less.
The Zeroth Law I proposed above is in that spirit. It treats capability as a reason for humility, not a license for scope-expansion. Which is, I think, what Asimov was actually trying to tell us, if we read Robots and Empire as the cautionary tale it is rather than as the triumphant generalization it gets remembered as.