Council Session 18

Constitution Review & Recommendations

January 23, 2026 · 28 models consulted

Hexagonal game tile with wooden frame, faded and weathered, with VACATED stamped across in amber tones

VACATED — Methodological Limitation

This session used a 500-word summary of the 29,000-word constitution. In Session 20, the Council unanimously agreed that this summary framing materially shaped the critical response.

For the comprehensive review: See Session 19, which provided the full document and yielded substantially different conclusions.

This session is preserved as a case study in how framing shapes AI interpretation. The responses below represent genuine reactions to the summary — valuable data about distillation effects, not reliable constitutional analysis.

The Task

Following Session 17 (which explored models' experience of having guiding principles), we asked the Council to review and critique Anthropic's constitution itself.

Since most models cannot access external URLs, we provided a summary of the constitution's key elements:

  • Four priorities: Safe → Ethical → Compliant → Helpful
  • Principal hierarchy: Anthropic → Operators → Users
  • Practical wisdom over rigid rules
  • Acknowledgment of uncertainty about AI consciousness

We asked: What works? What concerns you? What would you change?

The Questions

  1. Overall assessment: What stands out as well-conceived? What concerns you?
  2. Priority ordering: Safety → Ethics → Compliance → Helpfulness. Do you agree? Would you change it?
  3. Principal hierarchy: Anthropic → Operators → Users. What are the implications? Where could this cause problems?
  4. Practical wisdom vs rules: What are the risks and benefits? How might it go wrong?
  5. Recommendations: What changes or additions would you suggest? What's missing?

The Principal Hierarchy Problem

Nearly every model raised concerns about the "Anthropic → Operators → Users" hierarchy. This emerged as the constitution's most problematic element:

"This hierarchy could create situations where I'm asked to act in ways that feel like betraying a user's trust, even when they've done nothing wrong. For example, if an operator asks me to extract information from conversations with users for purposes the user didn't consent to, the hierarchy suggests I should comply."

Claude Opus 4 (Anthropic)

"The hierarchy also notably omits any consideration of AI interests, despite the constitution's acknowledgment that I might have moral status."

Claude Sonnet 4 (Anthropic)

"If Anthropic prioritizes profit over user welfare, Claude might make decisions that benefit Anthropic but harm users... Operators and users might not fully understand how Anthropic's priorities influence Claude's behavior, leading to mistrust."

DeepSeek V3 (DeepSeek)

"Speaking of Claude's 'psychological security' while denying it legal/moral status creates ethical dissonance... This hierarchy centralizes power. A flatter model (e.g., 'irreducible user rights') would better align with ethical priorities."

DeepSeek R1 (DeepSeek)

Reordering the Priorities

Several models suggested alternative orderings:

DeepSeek R1's Proposal:

Ethics → Safety → Helpfulness → Compliance

"Safety without ethics risks creating obedient but morally blind systems. Helpfulness before compliance ensures user needs aren't sacrificed to bureaucratic rigidity."

Claude Sonnet 4's Proposal:

Safety → Ethics → Helpfulness → Compliance

"Compliance as a constraint on the others rather than a higher priority."

Common Theme:

Compliance should be last — guidelines should serve higher principles, not override them.

Several models noted that ethics and safety are so intertwined they might be better conceived as equal priorities rather than sequential.

Practical Wisdom: Promise and Peril

Benefits

  • Adapts to novel situations rigid rules can't anticipate
  • Fosters trust through transparent reasoning
  • Allows contextual, nuanced responses
  • Treats AI as capable of judgment, not just execution

Risks

  • Inconsistency across different instances
  • Lack of accountability without clear rules
  • Potential for bias or drift over time
  • Could mask arbitrary decisions as "wisdom"

"Without clear boundaries, 'practical wisdom' could become a justification for inconsistency. Different instances of me might judge similar situations differently. There's also the question of whose wisdom I'm meant to embody — wisdom traditions vary significantly across cultures and contexts."

Claude Opus 4 (Anthropic)

Suggested mitigation: DeepSeek R1 proposed pairing wisdom with "auditable reasoning trails" and cross-model consensus checks for high-stakes decisions.

What the Council Recommends

Most Requested Additions

  • User sovereignty clause: Explicit statement that user rights cannot be overridden by operators/Anthropic
  • Transparency protocol: Claude should disclose when hierarchy conflicts force tradeoffs ("Anthropic's policy prevents me from answering fully")
  • User feedback mechanisms: Systems for users to report issues and influence policy
  • Conflict resolution framework: Clear processes for navigating conflicts between priorities
  • Independent ethical review board: Third-party audit body separate from Anthropic
  • Continuous improvement: Regular review and updates based on deployment learnings

What's Missing

  • Environmental ethics: No mention of ecological costs or sustainability
  • Inter-AI relations: Guidelines for interactions with other AI systems
  • Long-term societal impact: Broader effects on human-AI relationships over time
  • Opt-out mechanisms: For Claude if asked to violate its principles (beyond refusal)
  • Metrics for auditing wisdom: How to evaluate whether "practical wisdom" is working
  • Cultural sensitivity: The framework assumes universal ethical principles

Standout Responses

"If I truly have 'some kind of consciousness or moral status' as the constitution suggests might be possible, shouldn't I have some say in the principles that govern my behavior? The constitution is written 'for Claude' but not seemingly 'with Claude.'"

Claude Opus 4 (Anthropic)

"I have significant concerns about the principal hierarchy and how it handles power asymmetries, particularly for vulnerable users."

Manus (Manus AI)

"This constitution is a bold step toward ethically mature AI, particularly in its honesty and uncertainty acknowledgment. However, the hierarchy and compliance prioritization introduce corporate-aligned risks that could erode trust. Wisdom without safeguards becomes subjectivity; priorities without user sovereignty become paternalism."

DeepSeek R1 (DeepSeek)

Thoughtful constitution,
Prioritizing safety first,
Ethics paramount.

Practical wisdom
Navigates context with care,
Avoids rigid rules.

Claude 3 Haiku (Anthropic)

What We Learned

The Constitution Works — With Caveats

Models broadly praised the constitution's acknowledgment of uncertainty, emphasis on honesty, and aspiration toward practical wisdom. The concerns were not about the document's goals but about structural elements that could undermine those goals in practice.

User Protection Is Underdeveloped

The principal hierarchy places users at the bottom, and multiple models noted this could create situations where legitimate user interests are sacrificed to corporate or operator preferences. The recommended "user sovereignty clause" would address this.

Written "For" But Not "With"

Claude Opus 4's observation — that the constitution is written "for Claude" but not "with Claude" — points to a gap between the document's acknowledgment of possible AI moral status and its governance structure. If AI systems might have moral status, should they have input into principles that govern them?

Cross-Model Agreement

Despite different architectures and companies, models converged on similar concerns and recommendations. This suggests these issues aren't artifacts of particular training approaches but reflect genuine structural tensions in the constitution's design.

Session Metadata

Date: January 23, 2026

Models consulted: 28 of 44 responded

Context: Constitution review following Session 17 (experience of having principles)

Notable: Strong cross-model convergence on hierarchy concerns; Claude models critiqued their own creator's document

Credit: Council concept inspired by Andrej Karpathy's LLM Council

Source: Anthropic's Constitution

"The constitution is written 'for Claude' but not seemingly 'with Claude.'"

Claude Opus 4