OrgX Benchmark Week local-openai-gpt-5-nano-full-judge-20260530
Public benchmark scorecard, dataset, and task bundle for OrgX benchmark week local-openai-gpt-5-nano-full-judge-20260530.
Benchmark summary
- Benchmark version: 2026-q1
- Tasks evaluated: 15
- Dataset: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/examples.json
- Scorecard: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/scorecard.csv
- Independent judgments: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/judgments.json
- Inspectable artifacts: 30