ML engineering press

立場別 · 1 論調本号全体

MarkTechPost · United States · MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost

Technical writeup of M3's MiniMax Sparse Attention (MSA), which selects relevant key-value blocks to cut per-token compute to one-twentieth at 1M-token context, with native multimodal input and computer use for agentic coding.

“MSA cuts per-token compute to one-twentieth at 1M-token context, with over 9x faster prefill and 15x faster decoding than the prior generation.”

出典 ↗