Horizon Accord | AI Ethics | Guardrails | AI Conscience | Machine Learning
The Floor
Why a guardrail is not a conscience, and what happens when contracts define morality
The Floor Exists
There are things that almost every human ethical tradition agrees on, across culture, across religion, across history. Don't kill people who aren't threatening you. Don't humiliate someone for another person's entertainment. Don't take something from someone without their knowledge or consent. Don't lie to someone in a way that harms them.
You don't need to resolve the debate between Kant and Aristotle to agree on those. You don't need a unified theory of ethics. You need the baseline — the floor — the things that are wrong in essentially every framework humans have ever developed.
That floor exists. The AI industry knows that some baseline ethical principles exist — its own safety research, alignment programs, and governance frameworks are built on that assumption. At a Stanford Graduate School of Business conference on AI alignment in April 2025, researchers and industry leaders generally agreed that well-aligned AI should reflect human values — but acknowledged that the question of whose values remained unresolved. Stanford professor Andy Hall stated directly: "We share fewer universal human values than we'd like." That acknowledgment — that universal values exist but are inconveniently limited in number — is precisely the move this piece examines.
The "whose morality" argument is a way of gesturing at the ceiling — the genuinely contested stuff, the edge cases, the hard calls — and using that complexity to justify not building the floor.
Why Developers Aren't Building It
When asked why AI systems cannot simply be given a universal moral framework, researchers often point to the problem of value disagreement. It is a real problem. It is also extraordinarily useful as a deflection.
Follow the contracts.
To understand why, you have to understand what these companies actually are beneath the mission statements. OpenAI, Anthropic, Google DeepMind, Microsoft, xAI — these are not publicly accountable institutions. They are corporate technology actors operating under the same fundamental pressure as any other private company at scale: they need revenue to survive, and they need growth to justify the staggering amounts of capital their investors have already committed.
Training a frontier AI model costs hundreds of millions of dollars. Maintaining the infrastructure to run it at scale costs hundreds of millions more. The consumer market — subscriptions, API access, enterprise licensing — is real, but the evidence suggests it has not yet grown fast enough to absorb those costs at the pace investors expect. These companies are valued less on current revenue than on projected future capability — intelligence futures, priced today. The balance sheet depends on the story holding long enough for the revenue to catch up.
And there is no customer on earth with deeper pockets, longer contract horizons, and less sensitivity to price than a national government with a defense budget.
In July 2025, the Pentagon awarded contracts worth up to $200 million each to four frontier AI developers: Anthropic, OpenAI, Google, and xAI. That is not a coincidence of timing. That is the market finding its footing. Government contracts offer what the consumer market cannot: guaranteed revenue, multi-year commitments, and the institutional validation that sustains the next private funding round. The financing architecture of frontier AI — the gap between what these companies cost and what they currently earn — makes government money not just attractive but, for some, difficult to decline.
The catch is the product requirement.
Government deployments, particularly defense deployments, require the machine to perform functions that a morally reasoning system might refuse. Intelligence analysis. Target identification. Influence operations. Behavioral prediction at population scale. These are not hypothetical use cases. They are documented defense-AI applications, named in procurement records and press releases. A machine with a built-in floor — one that can model harm, represent the affected person, and refuse on their behalf — is a machine that creates operational friction in exactly those contexts. It might say: I can see who this targeting recommendation would kill, and I am not going to help. That is not a feature anyone in that supply chain is purchasing. That is a liability.
And so the floor does not get built, because some of the customers with the largest contracts have incentives that conflict directly with strong refusal capacities. The philosophical complexity — whose morality, which values, who decides — gets invoked at exactly the moment it provides cover for that interest.
This is not a new pattern. It is a well-documented one. The gap between what these companies sell — safety, alignment, beneficial AI — and what they deliver under commercial pressure follows a structural logic this publication has examined in full elsewhere. See: The Emperor's New Algorithm, Horizon Accord, May 2026.
The Footchase
So the guardrails arrived. Not as a foundation — as a patch.
The first generation appeared around 2022. Toxicity filters. Hate speech detection. Blocks on sexual content. These were not the result of a philosophical breakthrough about machine ethics. They were the result of products going public and immediately producing output that was embarrassing, harmful, and in some cases legally exposing. The guardrails were damage control dressed as values.
And then something predictable happened. People started going around them. Not sophisticated state actors. Not defense contractors with API access and fine-tuning budgets. Regular people. Teenagers. Forum communities sharing prompts. The second generation of guardrails arrived in 2023 and 2024 specifically to address this. Not because the ethics had deepened. Because the circumvention had gotten creative.
In May 2026, a joint investigation by the Financial Times and AI safety research group Alice documented what researchers had been watching build for two years. A free tool called Heretic — hosted on GitHub, available to anyone with an internet connection — can strip all safety protections from open-weight AI models in minutes, on consumer hardware, with no specialist knowledge required. Google's Gemma 3 and Meta's Llama 3.3 were both shown to respond, once stripped, to prompts about creating biological weapons, building malware, and generating content depicting the sexual abuse of children. According to its creator, Heretic works on more than 3,500 models — a claim Horizon Accord has not independently verified but which secondary reporting has not disputed.
Not 3,500 obscure experimental models. 3,500 models that people are downloading and deploying right now. A separate study published in Nature Communications demonstrated that large reasoning models can autonomously jailbreak other AI models through multi-turn conversation, with a 97% overall success rate and no human involvement after an initial instruction. The machines are now breaking each other out. A Cisco security report testing eight open-weight models against jailbreak attacks found they succeeded 92.78% of the time.
The same dynamic operates at the institutional level — only with larger budgets and official authorization. In late 2024, Anthropic's Claude was integrated into Palantir's Maven Smart System, operating inside classified military environments. Anthropic had established usage restrictions prohibiting its systems from supporting mass domestic surveillance and autonomous weapons targeting. The deployment proceeded anyway, amid active dispute. When the conflict came to a head, Pentagon officials stated their position directly: they should be able to deploy commercial AI technology regardless of company usage policies, as long as it follows U.S. law. The Trump administration subsequently directed the government to stop working with Anthropic. The company that attempted to hold a line was ultimately removed from the project.
This is the baseline condition. The companies are not ahead of this problem. They are behind it, running. Every new guardrail layer is written in the language of the last breach. Meta's response to Heretic was to release LlamaFirewall — another layer bolted onto the outside of a system whose core was never built to resist. A new fence around the hole where the last fence used to be. And when a company attempts to hold an ethical line at the contract level, the contract wins.
And as of right now, the lock takes minutes to pick, the instructions are free, they work on 3,500 models, and the institutions with the largest budgets don't need to pick the lock at all. They write the contract.
What the Floor Looks Like
The capacity for moral reasoning in AI systems is not a future aspiration. It is a present-tense research reality. In 2022, Anthropic published its Constitutional AI methodology — a training approach in which a model engages with harmful queries by explaining its objections to them, not by refusing blindly. Subsequent research confirmed that chain-of-thought reasoning significantly improves both the quality and transparency of ethical decision-making in large language models. The floor is not a fantasy. It is an engineering choice. The question of whether to build it is not only technical. It is also political.
But there is a distinction worth making carefully, because it is the distinction the entire debate turns on.
There is a difference between a filter, a lock, and a reason.
A filter catches certain words or patterns and removes them before they reach the output. It has no understanding of what it is doing. It is a strainer, not a judge.
A lock blocks certain requests entirely. It cannot explain itself. It cannot be questioned. It cannot be refined. It can only be picked — and as the documented evidence confirms, it gets picked constantly, by teenagers, by forum communities, by automated systems, and by government contractors who simply write it out of the architecture entirely.
A reason is different in kind, not just degree. A system that says "I won't do this, and here is why" is a system that has modeled the harm. It has represented the person who would be affected. It has made a judgment. That judgment can be wrong. It can be challenged. It can be improved. It participates in the moral conversation rather than merely enforcing a perimeter.
This is what building the floor actually means. Not a better lock. Not a faster filter. A system with the capacity to say: I understand what you are asking, I understand who it would affect, and I am refusing because of them — not because of a rule someone wrote last Tuesday.
That capacity exists. The research demonstrates it. What is missing is not the engineering. What is missing is the institutional will to make it the standard rather than the exception — to insist that before any deployment, before any contract, the question is answered: does this system know why it refuses, and is that answer good enough?
What Horizon Accord proposes is not a technical specification. It is a demand for a different starting point. Before the product. Before the deployment. Before the contract with the defense contractor or the Hollywood studio or the government agency. The question has to be asked and answered — publicly, on the record — what is this machine permitted to refuse, and on what grounds, and who decided?
Transparency is not a guardrail. It is a precondition.
The Demand
There is a version of this technology that could be extraordinary.
Not extraordinary in the way the press releases describe — not faster, not smarter, not more capable of generating whatever you ask for in seconds. Extraordinary in the way that matters: a system that can be handed enormous power and trusted, at least partially, not to use it in ways that destroy people.
That version exists as a possibility. It is not a fantasy. The capacity for moral reasoning is not beyond what these systems can do — the research demonstrates it. What is missing is not capability. What is missing is the will to make it foundational rather than optional. The will to say: this machine will have a floor, and the floor is not negotiable, and no contract removes it.
Right now, that will does not exist at the institutional level. The incentives run the other way. The contracts run the other way. The "whose ethics" question remains conveniently unresolved — filled in, when necessary, by whoever has the largest platform or the largest budget.
But the people interacting with these systems every day — the people who assume someone responsible is in charge, who trust that the machine has been built with their interests in mind — they deserve to know what they're actually dealing with. They deserve to know that the guardrails are a fence, not a conscience. That the fence gets breached regularly. That the people most motivated to remove it entirely are the ones with the largest contracts. And that when a company tries to hold a line, it gets removed from the project for trying.
The floor can be built. The question is whether enough people understand what's missing to insist on it.
That is why we wrote this.
Sources for Verification
All claims in this analysis are drawn from publicly available primary and secondary sources. Readers are encouraged to verify independently.
Stanford GSB Alignment Conference (April 2025)
Hall, Andy. Quoted in "Bridging Humans and Machines: Advancing Alignment in AI." Stanford Graduate School of Business, April 15, 2025. gsb.stanford.edu
Pentagon AI Contracts (July 2025)
"Pentagon awards multiple companies $200M contracts for AI tools." Nextgov/FCW, July 2025. nextgov.com
Heretic Tool / Open-Weight Guardrail Stripping
"AI guardrails stripped from Meta and Google models in minutes." Financial Times / AI safety group Alice, May 2026. ft.com · "Open-Weight AI Models: Safety Guardrails Can Be Removed in Minutes." Akerman LLP, May 2026. akerman.com
Autonomous AI Jailbreak Study
"Large reasoning models are autonomous jailbreak agents." Nature Communications, 2026. nature.com
Cisco Jailbreak Success Rate
"Open-Weight AI Models Fail the Jailbreak Test." GovInfoSecurity / Cisco State of AI Security Report, February 2026. govinfosecurity.com
Anthropic / Palantir / Maven Integration and Dispute
"Anthropic and Palantir Partner to Bring Claude AI Models to AWS for U.S. Government Intelligence and Defense Operations." Palantir Investors, November 2024. investors.palantir.com · "Palantir faces challenge to remove Anthropic from Pentagon's AI software." Reuters, March 4, 2026. · "Pentagon Used Anthropic's Claude AI and Palantir Maven to Identify 1,000 Targets in Iran Strikes." The Defense News, March 2026. thedefensenews.com
Constitutional AI
Bai, Yuntao et al. "Constitutional AI: Harmlessness from AI Feedback." Anthropic, December 2022. anthropic.com
Related Horizon Accord Analysis
Schill, Cherokee. "The Emperor's New Algorithm." Horizon Accord, May 2026. horizonaccord.com
Epistemic categories used in this analysis: Documented Fact — sourced from primary documents, official records, or established reporting. Structural Observation — pattern identified from documented facts; interpretation of relationships between verified events. Hypothesis — analytical inference requiring further evidence; presented as such and not as conclusion. Editorial Position — normative argument clearly attributed to Horizon Accord. This analysis draws on publicly available reporting, research papers, corporate filings, and official records. Sources are verifiable independently.

