ML engineering press
حسب الانحياز · 1 قراءات عبر هذه النسخة
MarkTechPost · United States · MiniMax ships M3, a Chinese open-weight model claiming frontier coding at one-twentieth the attention cost
Technical writeup of M3's MiniMax Sparse Attention (MSA), which selects relevant key-value blocks to cut per-token compute to one-twentieth at 1M-token context, with native multimodal input and computer use for agentic coding.
“MSA cuts per-token compute to one-twentieth at 1M-token context, with over 9x faster prefill and 15x faster decoding than the prior generation.”