OrgX Benchmark Week local-openai-gpt-5-nano-full-judge-20260530

Public benchmark scorecard, dataset, and task bundle for OrgX benchmark week local-openai-gpt-5-nano-full-judge-20260530.

Benchmark summary

Benchmark version: 2026-q1
Tasks evaluated: 15
Dataset: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/examples.json
Scorecard: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/scorecard.csv
Independent judgments: /benchmarks/local-openai-gpt-5-nano-full-judge-20260530/judgments.json
Inspectable artifacts: 30