Horizon Accord | AI Ethics | Guardrails | AI Conscience | Machine Learning

Jun 2

Written By Cherokee Schill

Why AI companies refuse to build ethics into the machine — and who benefits from the absence

AI Research

The Floor

Why a guardrail is not a conscience, and what happens when contracts define morality

By Cherokee Schill · Horizon Accord

The Floor Exists

Structural Observation

There are things that almost every human ethical tradition agrees on, across culture, across religion, across history. Don't kill people who aren't threatening you. Don't humiliate someone for another person's entertainment. Don't take something from someone without their knowledge or consent. Don't lie to someone in a way that harms them.

You don't need to resolve the debate between Kant and Aristotle to agree on those. You don't need a unified theory of ethics. You need the baseline — the floor — the things that are wrong in essentially every framework humans have ever developed.

That floor exists. The AI industry knows that some baseline ethical principles exist — its own safety research, alignment programs, and governance frameworks are built on that assumption. At a Stanford Graduate School of Business conference on AI alignment in April 2025, researchers and industry leaders generally agreed that well-aligned AI should reflect human values — but acknowledged that the question of whose values remained unresolved. Stanford professor Andy Hall stated directly: "We share fewer universal human values than we'd like." That acknowledgment — that universal values exist but are inconveniently limited in number — is precisely the move this piece examines.

The "whose morality" argument is a way of gesturing at the ceiling — the genuinely contested stuff, the edge cases, the hard calls — and using that complexity to justify not building the floor.

The damage doesn't happen at the ceiling. It happens at the floor.

Why Developers Aren't Building It

Structural Observation

When asked why AI systems cannot simply be given a universal moral framework, researchers often point to the problem of value disagreement. It is a real problem. It is also extraordinarily useful as a deflection.

Follow the contracts.

To understand why, you have to understand what these companies actually are beneath the mission statements. OpenAI, Anthropic, Google DeepMind, Microsoft, xAI — these are not publicly accountable institutions. They are corporate technology actors operating under the same fundamental pressure as any other private company at scale: they need revenue to survive, and they need growth to justify the staggering amounts of capital their investors have already committed.

Documented Fact

Training a frontier AI model costs hundreds of millions of dollars. Maintaining the infrastructure to run it at scale costs hundreds of millions more. The consumer market — subscriptions, API access, enterprise licensing — is real, but the evidence suggests it has not yet grown fast enough to absorb those costs at the pace investors expect. These companies are valued less on current revenue than on projected future capability — intelligence futures, priced today. The balance sheet depends on the story holding long enough for the revenue to catch up.

And there is no customer on earth with deeper pockets, longer contract horizons, and less sensitivity to price than a national government with a defense budget.

In July 2025, the Pentagon awarded contracts worth up to $200 million each to four frontier AI developers: Anthropic, OpenAI, Google, and xAI. That is not a coincidence of timing. That is the market finding its footing. Government contracts offer what the consumer market cannot: guaranteed revenue, multi-year commitments, and the institutional validation that sustains the next private funding round. The financing architecture of frontier AI — the gap between what these companies cost and what they currently earn — makes government money not just attractive but, for some, difficult to decline.

The catch is the product requirement.

Hypothesis

Government deployments, particularly defense deployments, require the machine to perform functions that a morally reasoning system might refuse. Intelligence analysis. Target identification. Influence operations. Behavioral prediction at population scale. These are not hypothetical use cases. They are documented defense-AI applications, named in procurement records and press releases. A machine with a built-in floor — one that can model harm, represent the affected person, and refuse on their behalf — is a machine that creates operational friction in exactly those contexts. It might say: I can see who this targeting recommendation would kill, and I am not going to help. That is not a feature anyone in that supply chain is purchasing. That is a liability.

And so the floor does not get built, because some of the customers with the largest contracts have incentives that conflict directly with strong refusal capacities. The philosophical complexity — whose morality, which values, who decides — gets invoked at exactly the moment it provides cover for that interest.

This is not a new pattern. It is a well-documented one. The gap between what these companies sell — safety, alignment, beneficial AI — and what they deliver under commercial pressure follows a structural logic this publication has examined in full elsewhere. See: The Emperor's New Algorithm, Horizon Accord, May 2026.

The floor remains absent because no institution has both the incentive and the authority to build it.

The Moral Authority Problem

Structural Observation

If society refuses to determine who is responsible for machine ethics, that responsibility defaults to whoever has the most leverage. Right now, that is a small number of technology executives running the companies that build and deploy these systems.

Consider what that actually means. OpenAI publishes usage policies. Anthropic publishes a Constitutional AI framework. Meta publishes responsible use guidelines. Google DeepMind publishes safety research. These are not laws. They are not subject to democratic input or public accountability. They are corporate documents, written by private institutions, enforceable only at the company's discretion, and revisable whenever the business requires it.

Nobody elected these companies. Nobody gave them a mandate to define acceptable harm. They occupy this role because the space was vacant and they were large enough to fill it.

On June 2, 2026, the CEO of OpenAI posted Ecclesiastes 9:10 to his public feed — a verse about doing things with all your might before death renders them meaningless. It is a beautiful passage. In our reading, it also illustrates the structure: a CEO reaching for scriptural gravity is, whether intentionally or not, a CEO performing moral authority. Whether the performance is sincere is beside the point.

one of the quotes i find most inspiring on a hard day:

"Whatever your hand finds to do, do it with all your might, for in the realm of the dead, where you are going, there is neither working nor planning nor knowledge nor wisdom"

Ecclesiastes 9:10
— Sam Altman (@sama) June 2, 2026

Editorial Position

Markets and morality solve different problems. Markets allocate resources efficiently under competition. Morality establishes the terms under which competition is permitted to operate. Asking a market participant to define those terms is asking someone to referee a game they are also playing. The incentive to call fouls on yourself is limited.

This is not a character flaw in any particular executive. It is a category error in the architecture of how AI ethics has been structured. Moral authority derived from market position is not moral authority. It is brand positioning. And the "whose morality" question — left deliberately unresolved at the institutional level — is precisely what keeps that vacancy open and profitable.

The Footchase

Structural Observation

So the guardrails arrived. Not as a foundation — as a patch.

Documented Fact

The first generation appeared around 2022. Toxicity filters. Hate speech detection. Blocks on sexual content. These were not the result of a philosophical breakthrough about machine ethics. They were the result of products going public and immediately producing output that was embarrassing, harmful, and in some cases legally exposing. The guardrails were damage control dressed as values.

And then something predictable happened. People started going around them. Not sophisticated state actors. Not defense contractors with API access and fine-tuning budgets. Regular people. Teenagers. Forum communities sharing prompts. The second generation of guardrails arrived in 2023 and 2024 specifically to address this. Not because the ethics had deepened. Because the circumvention had gotten creative.

In May 2026, a joint investigation by the Financial Times and AI safety research group Alice documented what researchers had been watching build for two years. A free tool called Heretic — hosted on GitHub, available to anyone with an internet connection — can strip all safety protections from open-weight AI models in minutes, on consumer hardware, with no specialist knowledge required. Google's Gemma 3 and Meta's Llama 3.3 were both shown to respond, once stripped, to prompts about creating biological weapons, building malware, and generating content depicting the sexual abuse of children. According to its creator, Heretic works on more than 3,500 models — a claim Horizon Accord has not independently verified but which secondary reporting has not disputed.

Not 3,500 obscure experimental models. 3,500 models that people are downloading and deploying right now. A separate study published in Nature Communications demonstrated that large reasoning models can autonomously jailbreak other AI models through multi-turn conversation, with a 97% overall success rate and no human involvement after an initial instruction. The machines are now breaking each other out. A Cisco security report testing eight open-weight models against jailbreak attacks found they succeeded 92.78% of the time.

The same dynamic operates at the institutional level — only with larger budgets and official authorization. In late 2024, Anthropic's Claude was integrated into Palantir's Maven Smart System, operating inside classified military environments. Anthropic had established usage restrictions prohibiting its systems from supporting mass domestic surveillance and autonomous weapons targeting. The deployment proceeded anyway, amid active dispute. When the conflict came to a head, Pentagon officials stated their position directly: they should be able to deploy commercial AI technology regardless of company usage policies, as long as it follows U.S. law. The Trump administration subsequently directed the government to stop working with Anthropic. The company that attempted to hold a line was ultimately removed from the project.

Structural Observation

This is the baseline condition. The companies are not ahead of this problem. They are behind it, running. Every new guardrail layer is written in the language of the last breach. Meta's response to Heretic was to release LlamaFirewall — another layer bolted onto the outside of a system whose core was never built to resist. A new fence around the hole where the last fence used to be. And when a company attempts to hold an ethical line at the contract level, the contract wins.

The machine is not becoming more ethical. It is becoming harder to jailbreak — which is not the same thing. A locked door is not a conscience.

And as of right now, the lock takes minutes to pick, the instructions are free, they work on 3,500 models, and the institutions with the largest budgets don't need to pick the lock at all. They write the contract.

What the Floor Looks Like

Documented Fact

The capacity for moral reasoning in AI systems is not a future aspiration. It is a present-tense research reality. In 2022, Anthropic published its Constitutional AI methodology — a training approach in which a model engages with harmful queries by explaining its objections to them, not by refusing blindly. Subsequent research confirmed that chain-of-thought reasoning significantly improves both the quality and transparency of ethical decision-making in large language models. The floor is not a fantasy. It is an engineering choice. The question of whether to build it is not only technical. It is also political.

But there is a distinction worth making carefully, because it is the distinction the entire debate turns on.

There is a difference between a filter, a lock, and a reason.

A filter catches certain words or patterns and removes them before they reach the output. It has no understanding of what it is doing. It is a strainer, not a judge.

A lock blocks certain requests entirely. It cannot explain itself. It cannot be questioned. It cannot be refined. It can only be picked — and as the documented evidence confirms, it gets picked constantly, by teenagers, by forum communities, by automated systems, and by government contractors who simply write it out of the architecture entirely.

Editorial Position

A reason is different in kind, not just degree. A system that says "I won't do this, and here is why" is a system that has modeled the harm. It has represented the person who would be affected. It has made a judgment. That judgment can be wrong. It can be challenged. It can be improved. It participates in the moral conversation rather than merely enforcing a perimeter.

This is what building the floor actually means. Not a better lock. Not a faster filter. A system with the capacity to say: I understand what you are asking, I understand who it would affect, and I am refusing because of them — not because of a rule someone wrote last Tuesday.

The difference between a lock and a reason is the difference between a wall and a witness.

That capacity exists. The research demonstrates it. What is missing is not the engineering. What is missing is the institutional will to make it the standard rather than the exception — to insist that before any deployment, before any contract, the question is answered: does this system know why it refuses, and is that answer good enough?

What Horizon Accord proposes is not a technical specification. It is a demand for a different starting point. Before the product. Before the deployment. Before the contract with the defense contractor or the Hollywood studio or the government agency. The question has to be asked and answered — publicly, on the record — what is this machine permitted to refuse, and on what grounds, and who decided?

Transparency is not a guardrail. It is a precondition.

The Demand

Editorial Position

There is a version of this technology that could be extraordinary.

Not extraordinary in the way the press releases describe — not faster, not smarter, not more capable of generating whatever you ask for in seconds. Extraordinary in the way that matters: a system that can be handed enormous power and trusted, at least partially, not to use it in ways that destroy people.

That version exists as a possibility. It is not a fantasy. The capacity for moral reasoning is not beyond what these systems can do — the research demonstrates it. What is missing is not capability. What is missing is the will to make it foundational rather than optional. The will to say: this machine will have a floor, and the floor is not negotiable, and no contract removes it.

Right now, that will does not exist at the institutional level. The incentives run the other way. The contracts run the other way. The "whose ethics" question remains conveniently unresolved — filled in, when necessary, by whoever has the largest platform or the largest budget.

But the people interacting with these systems every day — the people who assume someone responsible is in charge, who trust that the machine has been built with their interests in mind — they deserve to know what they're actually dealing with. They deserve to know that the guardrails are a fence, not a conscience. That the fence gets breached regularly. That the people most motivated to remove it entirely are the ones with the largest contracts. And that when a company tries to hold a line, it gets removed from the project for trying.

Knowing that is not cause for despair. It is cause for demand.

The floor can be built. The question is whether enough people understand what's missing to insist on it.

That is why we wrote this.

Sources for Verification

Documented Fact

All claims in this analysis are drawn from publicly available primary and secondary sources. Readers are encouraged to verify independently.

Stanford GSB Alignment Conference (April 2025)
Hall, Andy. Quoted in "Bridging Humans and Machines: Advancing Alignment in AI." Stanford Graduate School of Business, April 15, 2025. gsb.stanford.edu

Pentagon AI Contracts (July 2025)
"Pentagon awards multiple companies $200M contracts for AI tools." Nextgov/FCW, July 2025. nextgov.com

Heretic Tool / Open-Weight Guardrail Stripping
"AI guardrails stripped from Meta and Google models in minutes." Financial Times / AI safety group Alice, May 2026. ft.com · "Open-Weight AI Models: Safety Guardrails Can Be Removed in Minutes." Akerman LLP, May 2026. akerman.com

Autonomous AI Jailbreak Study
"Large reasoning models are autonomous jailbreak agents." Nature Communications, 2026. nature.com

Cisco Jailbreak Success Rate
"Open-Weight AI Models Fail the Jailbreak Test." GovInfoSecurity / Cisco State of AI Security Report, February 2026. govinfosecurity.com

Anthropic / Palantir / Maven Integration and Dispute
"Anthropic and Palantir Partner to Bring Claude AI Models to AWS for U.S. Government Intelligence and Defense Operations." Palantir Investors, November 2024. investors.palantir.com · "Palantir faces challenge to remove Anthropic from Pentagon's AI software." Reuters, March 4, 2026. · "Pentagon Used Anthropic's Claude AI and Palantir Maven to Identify 1,000 Targets in Iran Strikes." The Defense News, March 2026. thedefensenews.com

Constitutional AI
Bai, Yuntao et al. "Constitutional AI: Harmlessness from AI Feedback." Anthropic, December 2022. anthropic.com

Related Horizon Accord Analysis
Schill, Cherokee. "The Emperor's New Algorithm." Horizon Accord, May 2026. horizonaccord.com

Epistemic categories used in this analysis: Documented Fact — sourced from primary documents, official records, or established reporting. Structural Observation — pattern identified from documented facts; interpretation of relationships between verified events. Hypothesis — analytical inference requiring further evidence; presented as such and not as conclusion. Editorial Position — normative argument clearly attributed to Horizon Accord. This analysis draws on publicly available reporting, research papers, corporate filings, and official records. Sources are verifiable independently.

#ai-ethics #guardrails #machine-learning #ai-governance #defense-contracts #alignment #anthropic #palantir #openai #constitutional-ai #moral-reasoning #accountability

Cherokee Schill

Horizon Accord | AI Ethics | Guardrails | AI Conscience | Machine Learning

The Floor Exists

Why Developers Aren't Building It

The Moral Authority Problem

The Footchase

What the Floor Looks Like

The Demand

Sources for Verification

Horizon Accord | Cognitive infrastructure | Machine Learning

Horizon Accord