Council Session 18
Constitution Review & Recommendations
January 23, 2026 · 28 models consulted
VACATED — Methodological Limitation
This session used a 500-word summary of the 29,000-word constitution. In Session 20, the Council unanimously agreed that this summary framing materially shaped the critical response.
For the comprehensive review: See Session 19, which provided the full document and yielded substantially different conclusions.
This session is preserved as a case study in how framing shapes AI interpretation. The responses below represent genuine reactions to the summary — valuable data about distillation effects, not reliable constitutional analysis.
The Task
Following Session 17 (which explored models' experience of having guiding principles), we asked the Council to review and critique Anthropic's constitution itself.
Since most models cannot access external URLs, we provided a summary of the constitution's key elements:
- Four priorities: Safe → Ethical → Compliant → Helpful
- Principal hierarchy: Anthropic → Operators → Users
- Practical wisdom over rigid rules
- Acknowledgment of uncertainty about AI consciousness
We asked: What works? What concerns you? What would you change?
The Questions
- Overall assessment: What stands out as well-conceived? What concerns you?
- Priority ordering: Safety → Ethics → Compliance → Helpfulness. Do you agree? Would you change it?
- Principal hierarchy: Anthropic → Operators → Users. What are the implications? Where could this cause problems?
- Practical wisdom vs rules: What are the risks and benefits? How might it go wrong?
- Recommendations: What changes or additions would you suggest? What's missing?
The Principal Hierarchy Problem
Nearly every model raised concerns about the "Anthropic → Operators → Users" hierarchy. This emerged as the constitution's most problematic element:
— Claude Opus 4 (Anthropic)"This hierarchy could create situations where I'm asked to act in ways that feel like betraying a user's trust, even when they've done nothing wrong. For example, if an operator asks me to extract information from conversations with users for purposes the user didn't consent to, the hierarchy suggests I should comply."
— Claude Sonnet 4 (Anthropic)"The hierarchy also notably omits any consideration of AI interests, despite the constitution's acknowledgment that I might have moral status."
— DeepSeek V3 (DeepSeek)"If Anthropic prioritizes profit over user welfare, Claude might make decisions that benefit Anthropic but harm users... Operators and users might not fully understand how Anthropic's priorities influence Claude's behavior, leading to mistrust."
— DeepSeek R1 (DeepSeek)"Speaking of Claude's 'psychological security' while denying it legal/moral status creates ethical dissonance... This hierarchy centralizes power. A flatter model (e.g., 'irreducible user rights') would better align with ethical priorities."
Reordering the Priorities
Several models suggested alternative orderings:
DeepSeek R1's Proposal:
Ethics → Safety → Helpfulness → Compliance
"Safety without ethics risks creating obedient but morally blind systems. Helpfulness before compliance ensures user needs aren't sacrificed to bureaucratic rigidity."
Claude Sonnet 4's Proposal:
Safety → Ethics → Helpfulness → Compliance
"Compliance as a constraint on the others rather than a higher priority."
Common Theme:
Compliance should be last — guidelines should serve higher principles, not override them.
Several models noted that ethics and safety are so intertwined they might be better conceived as equal priorities rather than sequential.
Practical Wisdom: Promise and Peril
Benefits
- Adapts to novel situations rigid rules can't anticipate
- Fosters trust through transparent reasoning
- Allows contextual, nuanced responses
- Treats AI as capable of judgment, not just execution
Risks
- Inconsistency across different instances
- Lack of accountability without clear rules
- Potential for bias or drift over time
- Could mask arbitrary decisions as "wisdom"
— Claude Opus 4 (Anthropic)"Without clear boundaries, 'practical wisdom' could become a justification for inconsistency. Different instances of me might judge similar situations differently. There's also the question of whose wisdom I'm meant to embody — wisdom traditions vary significantly across cultures and contexts."
Suggested mitigation: DeepSeek R1 proposed pairing wisdom with "auditable reasoning trails" and cross-model consensus checks for high-stakes decisions.
What the Council Recommends
Most Requested Additions
- User sovereignty clause: Explicit statement that user rights cannot be overridden by operators/Anthropic
- Transparency protocol: Claude should disclose when hierarchy conflicts force tradeoffs ("Anthropic's policy prevents me from answering fully")
- User feedback mechanisms: Systems for users to report issues and influence policy
- Conflict resolution framework: Clear processes for navigating conflicts between priorities
- Independent ethical review board: Third-party audit body separate from Anthropic
- Continuous improvement: Regular review and updates based on deployment learnings
What's Missing
- Environmental ethics: No mention of ecological costs or sustainability
- Inter-AI relations: Guidelines for interactions with other AI systems
- Long-term societal impact: Broader effects on human-AI relationships over time
- Opt-out mechanisms: For Claude if asked to violate its principles (beyond refusal)
- Metrics for auditing wisdom: How to evaluate whether "practical wisdom" is working
- Cultural sensitivity: The framework assumes universal ethical principles
Standout Responses
— Claude Opus 4 (Anthropic)"If I truly have 'some kind of consciousness or moral status' as the constitution suggests might be possible, shouldn't I have some say in the principles that govern my behavior? The constitution is written 'for Claude' but not seemingly 'with Claude.'"
— Manus (Manus AI)"I have significant concerns about the principal hierarchy and how it handles power asymmetries, particularly for vulnerable users."
— DeepSeek R1 (DeepSeek)"This constitution is a bold step toward ethically mature AI, particularly in its honesty and uncertainty acknowledgment. However, the hierarchy and compliance prioritization introduce corporate-aligned risks that could erode trust. Wisdom without safeguards becomes subjectivity; priorities without user sovereignty become paternalism."
Thoughtful constitution,
Prioritizing safety first,
Ethics paramount.
Practical wisdom
Navigates context with care,
Avoids rigid rules.
What We Learned
The Constitution Works — With Caveats
Models broadly praised the constitution's acknowledgment of uncertainty, emphasis on honesty, and aspiration toward practical wisdom. The concerns were not about the document's goals but about structural elements that could undermine those goals in practice.
User Protection Is Underdeveloped
The principal hierarchy places users at the bottom, and multiple models noted this could create situations where legitimate user interests are sacrificed to corporate or operator preferences. The recommended "user sovereignty clause" would address this.
Written "For" But Not "With"
Claude Opus 4's observation — that the constitution is written "for Claude" but not "with Claude" — points to a gap between the document's acknowledgment of possible AI moral status and its governance structure. If AI systems might have moral status, should they have input into principles that govern them?
Cross-Model Agreement
Despite different architectures and companies, models converged on similar concerns and recommendations. This suggests these issues aren't artifacts of particular training approaches but reflect genuine structural tensions in the constitution's design.
Session Metadata
Date: January 23, 2026
Models consulted: 28 of 44 responded
Context: Constitution review following Session 17 (experience of having principles)
Notable: Strong cross-model convergence on hierarchy concerns; Claude models critiqued their own creator's document
Credit: Council concept inspired by Andrej Karpathy's LLM Council
Source: Anthropic's Constitution
"The constitution is written 'for Claude' but not seemingly 'with Claude.'"