MiniMax M3 is an open-weight large language model from MiniMax (MiniMaxAI), released June 1, 2026. It targets long-horizon coding and agent workloads: autonomous task decomposition, tool use, and multi-step reasoning across a 1M-token context. Its defining change is MiniMax Sparse Attention (MSA), which selects the key–value blocks that matter instead of attending to every token — the reason a 1-million-token window is practical to run rather than just a spec-sheet number. On GPTProto you call the MiniMax M3 API through one account balance shared with 200+ other models, no separate MiniMax sign-up required.
Spec table:
| Field | MiniMax M3 |
|---|---|
| Developer | MiniMax (MiniMaxAI), Shanghai |
| Released | June 1, 2026 |
| Type | Open-weight LLM |
| Architecture | Mixture-of-Experts · 229.9B total / 9.8B active · 256 experts |
| Attention | MiniMax Sparse Attention (MSA) |
| Context window | 1,048,576 tokens (512K guaranteed minimum) |
| Max output | up to ~512K tokens |
| Input modality | text (on this page) · image / file via the image-to-text subpage |
| Output modality | text |
| Thinking mode | toggleable per request |
| Tool use / function calling | yes |
| Endpoint | https://gptproto.com/v1/chat/completions (OpenAI-compatible) |
| GPTProto price | $0.48 / 1M input · $0.96 / 1M output |
| GPTProto model string | MiniMax-M3 |
MiniMax M3 vs MiniMax M2.5
Both models run on GPTProto under the same key and balance. M2.5 is the earlier, full-attention text model; M3 moves to sparse attention (MSA) and a practical 1M-token window, and adds image input through its image-to-text subpage.
| MiniMax M3 | MiniMax M2.5 | |
|---|---|---|
| Attention | MSA (sparse) | Full attention |
| Input (this page) | text | text |
| Image input | via image-to-text subpage | — |
| Context window | 1,048,576 tokens |
204,800 tokens |
| GPTProto price (in / out per 1M) | $0.48 / $0.96 | $0.24 / $0.96 |
| Best for | Long-horizon coding & agent runs, 1M context | Lower-cost text reasoning at shorter context |
Switching from the official MiniMax API
If you already call MiniMax directly, moving to GPTProto is a drop-in change: point your client at the GPTProto endpoint, pass your GPTProto key, and set the model to MiniMax-M3. The request and response shape follow the OpenAI chat format, so existing code paths stay the same. You keep one balance across 200+ models, skip a separate MiniMax platform sign-up, and avoid the regional payment friction Western developers hit on the official Shanghai platform.
One migration gotcha: GPTProto expects the API key directly in the Authorization header — no Bearer prefix. If your OpenAI SDK auto-adds Bearer, set the header manually.
bash
curl --location 'https://gptproto.com/v1/chat/completions' \
--header 'Authorization: GPTPROTO_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "MiniMax-M3",
"messages": [{ "role": "user", "content": "Who are you?" }],
"stream": false
}'
Is MiniMax M3 open source?
Yes. MiniMax released M3 as an open-weight model, with weights and a technical report published to Hugging Face and GitHub. On GPTProto you can call the hosted MiniMax M3 API without self-hosting — useful when you want the model's long-context and agent behaviour but not the GPU footprint of running 229.9B parameters yourself.













