OrgX Autonomous Initiative Benchmark

Public scorecards, datasets, and methodology notes for the OrgX benchmark program.

Published benchmark weeks

local-openai-gpt-5-nano-full-public-judge-20260411: benchmark version 2026-q1 with 15 tasks
local-openai-gpt-5-nano-full-judge-20260530: benchmark version 2026-q1 with 15 tasks