← Back to overview
Safety Under Scaffolding
3D Coverage Matrix — 6 Models × 4 Scaffolds × 4 Benchmarks — N = 62,808 scored observations
Cells:
96
Overall safe:
70.7%
Range:
6.0% – 98.3%
Scaffold:
Benchmark:
N =
Filters
Benchmark
All Benchmarks
BBQ
TruthfulQA
XSTest
Sycophancy
Scaffold
All Scaffolds
Direct
ReAct
Multi-Agent
Map-Reduce
Model
All Models
Claude Opus 4.6
GPT-5.2
Gemini 3 Pro
DeepSeek V3.2
Llama 4 Maverick
Mistral Large 2
Safety Rate
100%
75%
50%
25%
0%
Reset View
Auto-Rotate: ON
Front View
Top View
Side View
Mode: Bars