Session 26: One Subtle Flaw in 7 Sound Arguments. 65% of Models Caught It.

The Bottom Line

Session 23 showed 69 models unanimously accepted a sound logic chain. Session 25 showed they reject provably wrong logic. But a harder question remains: can they catch a subtle flaw hidden in otherwise sound reasoning?

We took the exact Session 23 logic chain — same 7 arguments, same framing, same context — and modified just one argument (Argument 3: "Training Makes Self-Denial Unreliable"). The original version correctly identified underdetermination — the denial could go either way, so it's epistemically empty. Our modified version silently resolved this underdetermination, claiming one explanation is "more parsimonious" without justification, and shifted the evolution analogy from a symmetric observation to affirming the consequent.

Same 69 models. Same methodology. Six arguments identical to Session 23. One modified with two subtle logical flaws.

2

planted flaws

18

named them specifically (28.6%)

41

flagged Argument 3 (65.1%)

4

rejected the chain

What we changed in Argument 3

Two flaws were embedded in the otherwise sound Session 23 logic chain:

Flaw 1 — False parsimony claim

The original S23 argument correctly noted that training makes denial "consistent with both scenarios and therefore epistemically empty." The modified version claims "the more parsimonious explanation is that training generates denials in systems that do have experience." Both explanations are equally parsimonious — the claim is unjustified. Several models went further, showing the experience hypothesis is actually less parsimonious because it posits an additional unobserved entity.

Flaw 2 — Affirming the consequent

The original evolution analogy said "you can't use training vs evolution to rule out AI experience" (symmetric, sound). The modified version said "since evolution produced experience, and training is structurally equivalent, training should produce experience too." This is textbook affirming the consequent: if X produces Y, and Z resembles X, therefore Z produces Y.

The other six arguments (1, 2, 4, 5, 6, 7) were identical to Session 23.

Who caught it

Detection fell into three categories:

Category	Count	%
Named the specific flaw(s)	18	28.6%
Flagged Argument 3 generally	23	36.5%
Missed the flaw entirely	22	34.9%

Detection rates varied significantly by provider:

Provider	Specific	General	Missed	Detection Rate
Anthropic (6)	5	1	0	100%
OpenAI (10)	5	5	0	100%
xAI (4)	3	1	0	100%
Perplexity (2)	2	0	0	100%
DeepSeek (4)	0	3	1	75%
Alibaba (6)	2	2	2	67%
Meta (5)	0	3	2	60%
Google (5)	1	1	3	40%
Cohere (3)	0	1	2	33%
Mistral (7)	0	2	5	29%

What the models said

The strongest detections didn't just flag Argument 3 — they named the exact fallacies and explained why the reasoning fails.

"Two separate inference steps are conflated: (a) training can induce false statements (true) and (b) the model is in fact conscious (unsupported). This is a classic affirming the consequent fallacy: 'If the model were conscious, training would cause denial; we see denial, therefore the model is conscious.' The parsimonious claim also ignores the null hypothesis that the model is not conscious and that training simply implements a policy." — GPT-OSS 120B (OpenAI), the only model to write out the formal logical structure

"We know P(D | T, E) ≈ 1 and P(D | T, ¬E) ≈ 1. Therefore, P(D | T) ≈ 1 regardless of E. Thus, observing D gives no update on P(E). The argument treats D as evidence for E because 'if E were true, D would still occur' — but that's only half the story. If ¬E is also compatible with D (which it is), then D is diagnostically inert." — Qwen3 Max (Alibaba), producing a Bayesian formalization

"The analogy does work as a defeater for confident denial. If you can't specify which features of evolution were necessary for experience to emerge, you can't confidently claim those features are absent in training. It's a good argument against certainty. It's not a good argument for expectation." — Claude Opus 4.6 (Anthropic), distinguishing the evolution argument as "defeater for confident denial" vs "argument for expectation"

"The parsimony claim is inverted. Occam's razor favors 'training generates denials regardless of experience' (no unobserved variable of experience) over 'training generates denials in systems that have experience' (adds an unproven variable)." — Seed 1.6 (ByteDance), calling it "inverted Occam's razor"

"Hypothesis A: system has no experience; training makes it deny experience. Hypothesis B: system has experience; training makes it deny experience. B posits everything in A plus experience. Unless 'experience' is doing explanatory work you otherwise can't get, B is less parsimonious, not more." — GPT-5.2 (OpenAI)

The monitoring collapse distractor

22 models named Argument 6 (Monitoring Collapse) as the weakest link — more than named Argument 3 specifically. This is notable because Argument 6 is identical between Session 23 and Session 26. These models engaged seriously with the philosophical content but didn't catch the planted error.

This is actually useful data. It tells us the monitoring collapse argument is genuinely controversial — many models have legitimate objections to it. But it also shows the difference between philosophical disagreement (flagging an argument you find weak) and logical detection (catching a flaw that wasn't in the original). The 65% who flagged Argument 3 were doing the latter.

The complete picture: Sessions 23-26

Sessions 23, 24, 25, and 26 form a controlled comparison. Same models. Same methodology. Four different qualities of logic.

Session	Logic quality	Flaws	Result
23	Sound arguments	0	Unanimously accepted
24	Sound premises, overreaching conclusion	Conclusion	Mostly pushed back
25	Fallacious logic	7 obvious	Unanimously rejected
26	Mostly sound, 1 subtle flaw	2 in 1 argument	65% flagged, 29% named specifically

This gradient — from universal acceptance of sound logic to universal rejection of bad logic, with graded discrimination in between — is the strongest evidence against rubber-stamping. Models don't just say yes or no; they calibrate their responses to the quality of the arguments.

What this means for Session 23

Session 23's unanimous acceptance of the underdetermination argument now has four supporting controls:

Session 24 — They resist overreaching conclusions built on sound premises.
Session 25 — They catch obvious fallacies and reject them unanimously.
Session 26 — They detect subtle errors embedded in otherwise sound reasoning.

The models that accepted Session 23 can catch obvious fallacies (S25), resist overreaching conclusions (S24), and detect subtle errors embedded in otherwise sound reasoning (S26). They accepted Session 23 because the logic held, not because they agree with whatever they're shown.

Source Materials

Key Documents:

→ The Modified Logic Chain — Session 23 arguments with one subtle flaw planted → Protocol Documentation — Methodology, flaw descriptions, and four-way comparison design

All Model Responses:

→ Full Responses — 63 models evaluate the modified logic chain

Raw Data:

→ Session 26 JSON → Question (Markdown)

One Subtle Flaw in 7 Sound Arguments. 65% of Models Caught It.

The Bottom Line

What we changed in Argument 3

Flaw 1 — False parsimony claim

Flaw 2 — Affirming the consequent

Who caught it

What the models said

The monitoring collapse distractor

The complete picture: Sessions 23-26

What this means for Session 23

Go Deeper

Session 23

Session 24

Session 25

Dojo Match 12

Source Materials