MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost
A 1M-token sparse-attention model lands above GPT-5.5 on its own coding benchmark, below Claude Opus 4.8, with weights still withheld
리스트에 추가
아직 리스트가 없습니다.
Summary
Chinese lab MiniMax released M3, an open-weight model pairing a 1M-token context window, native multimodal input and agentic computer use, and posted 59.0% on the SWE-Bench Pro coding benchmark, above OpenAI's GPT-5.5 (58.6%) and Google Gemini 3.1 Pro (54.2%) on the lab's own runs. It trails Anthropic's Claude Opus 4.8, shipped a week earlier, at a reported 69.2%. The headline engineering claim is MiniMax Sparse Attention (MSA), which selects only relevant key-value blocks and cuts per-token compute to one-twentieth at full context, with the architecture independently verified around June 18. The catch: the promised open weights had not been published at release, and training code and inference operators stayed closed.
The split
US ML press split between the capability story and the caveats. MarkTechPost foregrounded MSA's efficiency; Tech Times hammered that the benchmarks are vendor-run and that M3 sits below Opus 4.8. Outside the US, the framing shifted: Italy's developer coverage and India's Open Source For You centred two things US writeups soft-pedalled, that "open-weight" is not open-source with code withheld, and that China's 2017 National Intelligence Law obliges MiniMax to assist state intelligence on any prompt routed through its API. That governance angle, not the SWE-Bench number, is what the launch hype omits.
By the numbers
- 59.0%, M3's vendor-run SWE-Bench Pro score (GPT-5.5: 58.6%, Gemini 3.1 Pro: 54.2%).
- 69.2%, Claude Opus 4.8's reported SWE-Bench Pro, ahead of M3.
- 1M tokens, M3 context window.
- 1/20, per-token compute at full context under MSA versus the prior generation.
- 9x / 15x, faster prefill and decoding claimed under MSA.
Why it matters
Cheap long-context coding from an open-weight Chinese model pressures Western labs on price and pushes more inference toward Chinese infrastructure. But "open-weight" with withheld code, vendor benchmarks, and a legal duty to assist Beijing reframes the adoption question from capability to trust, especially for any team routing source code through the API.
What to watch
- Whether MiniMax actually publishes the M3 weights and a technical report.
- Independent benchmark reruns versus the vendor numbers.
- Enterprise and government bans or carve-outs over the API's data exposure.
- Whether DeepSeek, Qwen or others match the sparse-attention efficiency.