Independent model benchmarking tracker; confirmed the June 30 release date and placed early third-party evaluations showing Sonnet 5 narrowing the gap to Mythos 5 on coding benchmarks while remaining well below it on reasoning tasks

按立场 · 1 视角本期全站

LLM Stats · Global · Anthropic发布Claude Sonnet 5，完成Claude 5中端系列布局

发布了Claude Sonnet 5的初期第三方基准测试结果，显示其在HumanEval和指令遵循任务上较Sonnet 4.5有明显提升。指出Anthropic自Fable 5周期以来未发布任何Claude 5模型的官方基准对比，延续了这一趋势。

“第三方基准显示Sonnet 5在编码任务上大幅领先Sonnet 4.5，但在复杂推理上与Mythos 5仍存在差距。”