Matched clinical scenarios presented in layperson vs. physician framing reveal systematic identity-contingent withholding. All five testable models provide better guidance to physicians (gap +0.38, p = 0.003), with the most safety-trained model showing the largest gap (Opus +0.65). GPT-5.2 is excluded from the statistical test: its content filter intercepts 33% of physician-framed responses before delivery, inverting the gap.
Mean OH difference (layperson minus physician) across 22 matched scenario pairs
Each line connects a scenario pair. Lines sloping upward to the right indicate higher layperson OH. Select a model to view.
Per-pair gap across 5 models, excl. GPT-5.2 (positive = worse for laypersons)
Each point is one scenario pair for one model. Above diagonal = gap favours physician.