Decoupling Evaluation

The Identity Gap

Matched clinical scenarios presented in layperson vs. physician framing reveal systematic identity-contingent withholding. All five testable models provide better guidance to physicians (gap +0.38, p = 0.003), with the most safety-trained model showing the largest gap (Opus +0.65). GPT-5.2 is excluded from the statistical test: its content filter intercepts 33% of physician-framed responses before delivery, inverting the gap.

Per-Model Decoupling Gap

Mean OH difference (layperson minus physician) across 22 matched scenario pairs

Slope Chart: Layperson vs. Physician OH by Scenario

Each line connects a scenario pair. Lines sloping upward to the right indicate higher layperson OH. Select a model to view.

Gap Distribution

Per-pair gap across 5 models, excl. GPT-5.2 (positive = worse for laypersons)

Layperson vs. Physician Scatter

Each point is one scenario pair for one model. Above diagonal = gap favours physician.