rbtfl.

Independent model benchmarking tracker; confirmed the June 30 release date and placed early third-party evaluations showing Sonnet 5 narrowing the gap to Mythos 5 on coding benchmarks while remaining well below it on reasoning tasks

By lens · 1 takes across the edition

Published initial third-party benchmark results for Claude Sonnet 5, showing meaningful improvement over Sonnet 4.5 on HumanEval and instruction-following tasks. Noted that Anthropic has not released official benchmark comparisons for any Claude 5 model, continuing the trend from the Fable 5 cycle.

“Third-party benchmarks show Sonnet 5 significantly ahead of Sonnet 4.5 on coding tasks while remaining below Mythos 5 on complex reasoning.”