Rating: 8.7/10 ⭐⭐⭐⭐⭐⭐⭐⭐⭐
What Is MiniMax M2.7?
MiniMax M2.7 is a proprietary large language model released on March 17, 2026 by MiniMax, a Shanghai-based AI lab that’s been quietly shipping competitive frontier models since 2021. M2.7 is a direct successor to MiniMax M2.5 (released just weeks earlier in February 2026) and represents a significant architectural and capability leap — particularly in software engineering and professional document tasks.
The one-line differentiator: it’s the highest-ranked model on the Artificial Analysis Intelligence Index (#1 out of 136) at a price most budget-tier models don’t match. API pricing starts at $0.30 per 1M input tokens and $1.20 per 1M output tokens, with a 205K token context window and native reasoning capabilities. You can access it via the MiniMax API platform.
The Story: A Self-Evolving Model That Ranked #1 on a Budget
Here’s what makes M2.7 genuinely interesting — and what most coverage is glossing over.
MiniMax claims M2.7 participated in its own development, undergoing 100+ autonomous optimization rounds and achieving a 30% improvement on internal evaluation sets through self-directed reinforcement learning. Whether you call that “self-evolving” or a very aggressive RL training loop, the result is a model that jumped dramatically from M2.5’s performance baseline in a matter of weeks.
The raw numbers back it up. On the Artificial Analysis Intelligence Index (v4.0, incorporating 10 evaluations including GPQA Diamond, Humanity’s Last Exam, SciCode, and agentic benchmarks), M2.7 scores 50 out of 100 — ranking #1 out of 136 evaluated models. The field average sits at 19. That gap is not subtle.
On coding specifically: M2.7 hit 56.22% on SWE-Pro (on par with GPT-5.3-Codex), 86.2% on PinchBench (5th globally, within 1.2 points of Claude Opus 4.6), and 47% task pass rate on Kilo Bench (2nd overall among 89 autonomous coding tasks). The model also achieved a 66.6% medal rate on MLE-Bench Lite, tying Google’s Gemini 3.1 on machine learning research tasks.
For hallucination: M2.7 scores a 34% hallucination rate on the AA-Omniscience benchmark, beating Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro Preview (50%). That’s a meaningful differentiator for production applications where factual accuracy matters.
The cost math is stark: at $0.30/M input tokens, M2.7 is reportedly up to 50x cheaper than Claude Opus 4.6 on comparable tasks. For developers running high-volume workloads, that’s the kind of number that changes architecture decisions.
Benchmark Performance
| Benchmark | MiniMax M2.7 | GPT-5.4 Mini | Mistral Small 4 | GLM-5-Turbo |
|---|---|---|---|---|
| AA Intelligence Index | 50 (#1/136) | ~36 | ~22 | ~28 |
| SWE-Pro (Coding) | 56.22% | 54.4% | ~38% | ~48% |
| PinchBench (Agentic Coding) | 86.2% | ~84% | ~70% | ~80% |
| Terminal-Bench 2 (System Ops) | 57.0% | 60.0% | ~42% | ~50% |
| Hallucination Rate (lower = better) | 34% | ~38% | ~42% | ~40% |
| MLE-Bench Lite (ML Research) | 66.6% medal rate | ~55% | ~40% | ~50% |
| Output Speed (tokens/sec) | 44 (slow) | ~120 | ~110 | ~85 |
Source: Artificial Analysis, Kilo.ai, MiniMax official benchmarks, March 2026. Competitor estimates where independently verified data unavailable.
Pricing
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Free Tier |
|---|---|---|---|---|
| MiniMax M2.7 | $0.30 | $1.20 | 205K | Limited |
| GPT-5.4 Mini | $0.75 | $4.50 | 400K | Via ChatGPT |
| Mistral Small 4 | $0.15 | $0.60 | 128K | Limited |
| GLM-5-Turbo | $0.96–$1.20 | $3.20–$4.00 | 128K | Limited |
| Gemini 3.1 Flash-Lite | ~$0.10–$0.25 | ~$0.40 | 1M | Yes (Google AI Studio) |
Source: Official pricing pages, OpenRouter listings, March 2026. MiniMax pricing via MiniMax API platform.
Cost perspective: Running 10 million input tokens through M2.7 costs $3. The same workload on GPT-5.4 Mini costs $7.50 — 2.5x more. On GLM-5-Turbo, you’re looking at $9.60–$12. For developers running production-scale pipelines, M2.7’s pricing math is genuinely attractive given its benchmark position.
Key Features
1. Native Reasoning with Minimal Overhead
M2.7 includes native reasoning capabilities without requiring a separate “thinking” model variant. It engages chain-of-thought automatically on complex tasks, and the reasoning layer is reflected in its benchmark performance (it generated 87M tokens total running the Intelligence Index — 4x the average — which explains both its depth and its verbosity). The limitation: you can’t easily toggle reasoning depth, so shorter tasks get the full treatment whether they need it or not.
2. 205K Token Context Window
At 205K tokens, M2.7 handles documents up to roughly 150,000 words — entire codebases, lengthy research papers, or multi-document legal sets in a single call. The AA-LCR (Long Context Reasoning) benchmark is one of the 10 evaluations included in the Intelligence Index, and M2.7’s strong overall score suggests legitimate long-context performance, not just window size on paper. The caveat: long-context performance is harder to independently verify, and verbosity may inflate token counts on output.
3. Agentic Coding: Deep Context Gathering
Independent benchmarking from Kilo.ai highlighted a distinctive M2.7 behavior in coding tasks: it “reads extensively before writing” — pulling in adjacent files, tracing call chains, analyzing dependency graphs before touching a single line of code. This approach led it to solve tasks other models missed on Kilo Bench (47% pass rate, 2nd overall). The downside is documented: this deep-read behavior can cause timeouts on time-sensitive agentic workflows where quick responses are required.
4. Low Hallucination Rate
A 34% hallucination rate on AA-Omniscience puts M2.7 ahead of several premium models including Claude Sonnet 4.6 (46%) and Gemini 3.1 Pro Preview (50%). For RAG applications, research summarization, or any workflow where factual accuracy is load-bearing, this is a meaningful differentiator. That said, “34% hallucination rate” is still 34% — treat this as relative improvement, not a green light for unsupervised fact-critical outputs.
5. Self-Optimized Architecture
The model reportedly underwent 100+ autonomous optimization rounds using self-generated training signal — a form of self-play RL that has become an increasingly common technique among frontier labs. The result in M2 series is a 30% performance improvement from M2.5 to M2.7 on internal evaluation sets. This rapid iteration cadence (weeks, not months) is MiniMax’s current competitive advantage. The limitation: this process is entirely opaque. There’s no published technical report explaining the optimization methodology, which matters if you’re evaluating it for enterprise deployment.
6. Strong Professional Office Document Performance
M2.7 achieved an Elo of 1495 on GDPval-AA (real-world professional work tasks), which MiniMax claims is highest among open-source-accessible models. Its document processing capability spans complex Excel formulas, multi-round PowerPoint revisions, and layered Word editing. The model’s verbosity actually works in its favor here — thoroughness in document tasks is a feature, not a bug.
Who Is It For / Who Should Look Elsewhere
Use MiniMax M2.7 if you:
- Run cost-sensitive production API workloads — at $0.30/M input, you get frontier-tier intelligence at budget pricing. The math is hard to argue with for teams burning millions of tokens monthly.
- Need deep agentic coding capability — M2.7’s context-first approach makes it unusually strong at complex refactoring, codebase-wide changes, and tasks that require understanding system-level dependencies before acting.
- Work with long documents, research, or legal sets — 205K context + strong long-context reasoning benchmarks + low hallucination rates = a credible document intelligence option.
- Build ML research pipelines or autonomous workflows — 66.6% medal rate on MLE-Bench Lite, on par with Gemini 3.1. If your agents are doing data science work, this model belongs in the comparison set.
- Want state-of-the-art intelligence without paying for flagship pricing — #1 on the Intelligence Index at sub-$0.50/M is the value proposition. For experimental projects or startups watching burn rate, this is compelling.
Look elsewhere if you:
- Need speed — 44 tokens/second is slow. If your application involves real-time user interaction or time-bounded agentic loops, M2.7 will frustrate you. GPT-5.4 Mini (~120 tok/s) is a better choice.
- Run tight agentic loops with timeouts — M2.7’s “read everything first” behavior burns time. Workflows with <5 second response windows will hit issues.
- Need multimodal (image/video) inputs — M2.7 is text-in, text-out. No image processing. GPT-5.4 Mini, Gemini 3.1 Flash-Lite handle multimodal natively.
- Require an open-source or open-weights model — M2.7 is fully proprietary with no weights release. If you need on-premise deployment or model inspection, look at Mistral Small 4 or DeepSeek.
MiniMax M2.7 vs. Competitors
| Feature | MiniMax M2.7 | GPT-5.4 Mini | Mistral Small 4 | GLM-5-Turbo |
|---|---|---|---|---|
| Input Price (1M tokens) | $0.30 | $0.75 | $0.15 | $0.96–$1.20 |
| Output Price (1M tokens) | $1.20 | $4.50 | $0.60 | $3.20–$4.00 |
| Context Window | 205K | 400K | 128K | 128K |
| AA Intelligence Index | 50 (#1) | ~36 | ~22 | ~28 |
| Output Speed | 44 tok/s | ~120 tok/s | ~110 tok/s | ~85 tok/s |
| Multimodal Input | Text only | Text + Images | Text + Images | Text + Images |
| Native Reasoning | Yes | Yes | Limited | Yes |
| Open Weights | No (Proprietary) | No (Proprietary) | Yes (Apache 2.0) | No (Proprietary) |
| Best For | Cost-efficient frontier intelligence, agentic coding | Speed + multimodal at mid-tier pricing | Cheapest capable option, self-hosting | Agent frameworks, tool use |
| Developer Ecosystem | Growing | Mature | Strong | Moderate |
Controversy / What They Don’t Advertise
1. “Self-Evolving” Is Marketing Language for RL Fine-Tuning
MiniMax’s claim that M2.7 “participated in its own development” and underwent “autonomous optimization” is technically accurate but strategically framed. What they’re describing is reinforcement learning from AI-generated feedback — a well-established technique used by most major labs. Calling it “self-evolving” implies something more dramatic than iterative RL training. The results are real; the framing is selling.
2. Verbosity Is a Real Problem in Production
M2.7 generated 87M tokens running the Intelligence Index benchmarks — 4.35x the field average of 20M. This verbosity is responsible for much of its benchmark depth, but it translates directly to higher costs and slower responses in production. At $1.20/M output tokens, a verbose model can quickly erode the input-price advantage. Developers need to implement output length controls or accept unpredictable token costs.
3. No Technical Report, No Architecture Transparency
There is no published technical report, model card, or architecture paper for M2.7 at time of writing. MiniMax has shared benchmark results but hasn’t disclosed parameter count, training data, or the specifics of their self-optimization methodology. For enterprise deployment and security review processes, this is a meaningful gap. You’re flying somewhat blind on safety alignment and training provenance.
4. Speed Ranking Is Near the Bottom
At 44 tokens/second, M2.7 ranks #104 out of 136 models on speed (Artificial Analysis). For any interactive application — chatbots, real-time document editors, coding assistants where a developer is waiting — this is a significant user experience problem. The model’s intelligence doesn’t help if users are staring at a spinner for 30 seconds.
5. Chinese Lab Geopolitical Risk
MiniMax is a Shanghai-based company. For organizations with data residency requirements, US government contracts, or specific GDPR/CCPA compliance constraints around data processing geography, using a Chinese API provider carries regulatory risk. MiniMax offers API access but doesn’t have the data center distribution or compliance certification footprint of AWS, Google, or Azure. This isn’t a reason to avoid M2.7 for most use cases, but it’s a box that needs checking for enterprise procurement.
Pros and Cons
Pros
- ✅ #1 on the Artificial Analysis Intelligence Index out of 136 models — not cherry-picked benchmark selection, but a composite of 10 independent evaluations
- ✅ $0.30/M input tokens — frontier-tier intelligence at a price competitive with mid-tier models
- ✅ 34% hallucination rate — lower than Claude Sonnet 4.6 and Gemini 3.1 Pro Preview; meaningful for production accuracy requirements
- ✅ 205K context window with demonstrated long-context reasoning performance (AA-LCR included in index score)
- ✅ Exceptional agentic coding — 2nd on Kilo Bench, 5th on PinchBench; deep dependency analysis approach pays off on complex codebases
- ✅ Rapid iteration cadence — M2.5 to M2.7 in weeks with 30% performance jump; MiniMax is shipping fast
Cons
- ❌ 44 tokens/second output speed — ranked #104/136; unusable for real-time interactive applications
- ❌ Extreme verbosity — 4.35x the average token output; can spike costs unpredictably and cause timeout failures in agentic loops
- ❌ Text-only input — no image, audio, or video processing; competitors at similar price points offer multimodal
- ❌ No technical report or model card — complete transparency gap on architecture, training data, parameter count, and safety alignment methodology
- ❌ Limited Western developer ecosystem — documentation, integrations, and community support are less mature than OpenAI, Anthropic, or Google equivalents
Getting Started with MiniMax M2.7
- Sign up at minimaxi.com — Create an account at the MiniMax platform. API access is available directly. Some regional restrictions may apply; VPN may be required for initial registration depending on your location.
- Grab your API key — Navigate to the API settings panel after account creation. Store the key securely in environment variables — do not hardcode into source files.
- Use OpenRouter as a fallback — MiniMax M2.7 is available via OpenRouter, which normalizes the API interface with OpenAI-compatible endpoints. If you’re already using GPT or Claude in your stack, OpenRouter lets you swap in M2.7 with minimal integration changes.
- Set output length limits immediately — Before running any production workloads, implement
max_tokenscaps in your API calls. M2.7’s verbosity will run costs up fast without them. Start with 2,000–4,000 token caps and adjust based on your task. - Test on your slowest-tolerance workflows first — Deploy M2.7 to batch processing, overnight research pipelines, or async document analysis tasks before trying it in anything user-facing. Validate the speed characteristics against your latency requirements before committing to a migration.
Frequently Asked Questions
What is MiniMax M2.7?
MiniMax M2.7 is a proprietary large language model released on March 17, 2026 by Shanghai-based AI lab MiniMax. It currently ranks #1 out of 136 models on the Artificial Analysis Intelligence Index, with a score of 50 (field average: 19). It features native reasoning capabilities, a 205K token context window, and is priced at $0.30 per 1M input tokens.
How much does MiniMax M2.7 cost?
MiniMax M2.7 is priced at $0.30 per 1M input tokens and $1.20 per 1M output tokens via the MiniMax API. It’s also available through OpenRouter. The input price is lower than GPT-5.4 Mini ($0.75/M) and GLM-5-Turbo ($0.96–$1.20/M), though Mistral Small 4 is slightly cheaper at $0.15/M input.
Is MiniMax M2.7 better than GPT-5.4 Mini?
On intelligence benchmarks, yes — M2.7 ranks #1 on the AA Intelligence Index (score: 50) vs GPT-5.4 Mini (approximately 36). M2.7 also has a lower hallucination rate and stronger agentic coding benchmarks. However, GPT-5.4 Mini is significantly faster (~120 tokens/sec vs 44), supports multimodal inputs (images), has a larger context window (400K vs 205K), and a more mature developer ecosystem. Which is “better” depends entirely on your use case.
What is MiniMax M2.7’s context window?
MiniMax M2.7 has a 205K token context window (approximately 150,000 words). This is larger than Mistral Small 4 and GLM-5-Turbo (both 128K), but smaller than GPT-5.4 Mini (400K) and Google’s Gemini models (which offer up to 1M tokens).
Is MiniMax M2.7 open source?
No. MiniMax M2.7 is a fully proprietary model with no open weights release. If you need an open-source or self-hostable model, Mistral Small 4 (Apache 2.0 licensed) or DeepSeek models are better options.
How fast is MiniMax M2.7?
MiniMax M2.7 outputs approximately 44 tokens per second, ranking #104 out of 136 models on speed (Artificial Analysis). This is notably slow compared to GPT-5.4 Mini (~120 tok/s) and Mistral Small 4 (~110 tok/s). It is not suitable for real-time interactive applications.
Is MiniMax M2.7 a reasoning model?
Yes. MiniMax M2.7 includes native reasoning capabilities and is classified as a reasoning model on Artificial Analysis. It engages chain-of-thought processing automatically, which contributes to its high intelligence scores but also to its verbosity (4x average token output) and slower response times.
Does MiniMax M2.7 support image inputs?
No. MiniMax M2.7 is text-in, text-out only. It does not support image, audio, or video inputs. If you need multimodal capabilities at a comparable price point, GPT-5.4 Mini, Mistral Small 4, or Gemini 3.1 Flash-Lite are better options.
What happened between MiniMax M2.5 and M2.7?
MiniMax M2.5 was released in February 2026, focused on polyglot coding and multilingual performance. M2.7 followed just weeks later (March 17, 2026) with a 30% internal performance improvement, driven primarily by 100+ rounds of autonomous reinforcement learning optimization. M2.7 shows dramatically stronger software engineering, professional document handling, and machine learning research task performance compared to M2.5. See our MiniMax M2.5 review for the previous generation’s breakdown.
Is MiniMax M2.7 worth it for production use?
For batch processing, async pipelines, research summarization, and agentic coding tasks — yes, highly worth considering. The intelligence-per-dollar ratio is unmatched at its price point as of March 2026. For anything requiring real-time responses, multimodal inputs, or operating within strict latency budgets, M2.7’s speed and verbosity limitations make it a poor fit. The lack of a technical report is also a concern for enterprise compliance processes.
Final Verdict
MiniMax M2.7 is the most interesting price-performance story in AI right now — and most people outside the developer community are sleeping on it.
Rating: 8.7/10. That score reflects genuine frontier intelligence at a budget price, but docked meaningfully for a speed ranking that disqualifies it from half the most common deployment patterns. If you’re building batch pipelines, async agents, research tools, or any workflow that isn’t waiting on real-time output, M2.7 deserves a serious spot in your evaluation. #1 intelligence ranking at $0.30/M input is a number you don’t just ignore.
The caveats are real: 44 tokens/second is slow, the verbosity will bite you if you’re not careful with token limits, and there’s no technical report — which matters for enterprise buyers doing proper diligence. But for startups, indie developers, and anyone running high-volume AI workloads on a tighter budget, M2.7 is the model that changes what “affordable” means at the frontier tier.
If you’re evaluating the MiniMax model family, also check out our MiniMax M2.5 review to understand how rapidly this lab is iterating. The jump from M2.5 to M2.7 in a matter of weeks should tell you something about where MiniMax is headed.
Bottom line: Use it for intelligence-heavy batch work and agentic coding. Don’t use it for anything a user is watching in real-time.



