On March 19, 2026, Cursor dropped something the AI coding world didn’t see coming: not another wrapper around Claude or GPT, but their own model. Composer 2 is Anysphere’s first in-house frontier AI, trained end-to-end for the kind of autonomous, project-scale coding tasks that take a human engineer a full day. It benchmarks ahead of Claude Opus 4.6 on Terminal-Bench 2.0 – the same day Bloomberg broke the story of Cursor chasing a $50 billion valuation. The timing isn’t accidental.
Rating: 4.3/5 ????
What Is Cursor Composer 2?
Cursor Composer 2 is the first proprietary AI model from Anysphere, the company behind the Cursor AI code editor. It was announced and made available on March 19, 2026 – the same day Bloomberg reported Cursor’s talks to raise capital at approximately $50 billion.
Unlike Cursor’s previous Agent mode (which routed through Claude, GPT-4o, or Gemini), Composer 2 is Anysphere’s own model: trained on long-horizon coding tasks through reinforcement learning, with a 200,000-token context window, and deeply integrated into Cursor’s native tool stack. It can autonomously browse the web, edit files, run shell commands, and use semantic code search – carrying out hundreds of sequential actions without losing the thread of a complex goal.
One-line differentiator: it’s the only frontier coding model trained specifically to operate inside a developer’s actual IDE workflow rather than as a standalone CLI or API. Try Cursor at cursor.com ?
The Story: A $50B Startup Just Shipped Its Own Frontier Model
Cursor has 1M+ daily active users and counts Stripe, Figma, Salesforce, and NVIDIA among its 50,000+ enterprise customers. NVIDIA CEO Jensen Huang called it his “favorite enterprise AI service.” Y Combinator’s Diana Hu said adoption at portfolio companies went “from single digits to over 80%.” This isn’t a niche tool – it’s become load-bearing infrastructure for how software gets built.
The strategic move with Composer 2 is significant. Until now, Cursor was a delivery vehicle for other companies’ models. Great UI, exceptional context management, smart workflow integration – but the intelligence came from Anthropic or OpenAI. That dependency carried real risk: margin compression (model providers take a cut), competitive exposure (OpenAI has its own Codex), and strategic leverage that sat with someone else.
Composer 2 changes the equation. Cursor now controls its own model, its own training, its own pricing power. And based on the benchmark numbers, they didn’t ship a me-too model to check a strategic box – they shipped something that genuinely competes.
The Numbers That Matter
| Model | CursorBench | Terminal-Bench 2.0 | SWE-bench Multilingual |
|---|---|---|---|
| Composer 2 | 61.3 | 61.7 | 73.7 |
| Composer 1.5 | 44.2 | 47.9 | 65.9 |
| Composer 1 | 38.0 | 40.0 | 56.9 |
Source: Cursor official blog, March 19, 2026. CursorBench is Cursor’s internal evaluation framework.
The generational leap between Composer 1.5 and Composer 2 is notable – a 38% improvement on Terminal-Bench 2.0 and a 12% jump on SWE-bench Multilingual. Cursor attributes this to completing their first continued pretraining run, which gave the model a stronger base before reinforcement learning was applied. In plain terms: they trained a better base model, then trained it harder on coding tasks.
Benchmark Performance: How It Stacks Up Against Rivals
| Model | Terminal-Bench 2.0 | SWE-bench Verified | SWE-bench Multilingual |
|---|---|---|---|
| Cursor Composer 2 | 61.7 | N/A (not reported) | 73.7 |
| Claude Opus 4.6 | 58.0 | 62.7% (SWE-bench Lite) | ~72% (est.) |
| Claude Opus 4.5 | ~55 (est.) | 80.9% | ~70% (est.) |
| GPT-5.4 (OpenAI) | 75.1 | 77.2% | ~75% (est.) |
| GPT-5.2 Codex | ~70 (est.) | ~76% | N/A |
Sources: Cursor blog, VentureBeat, official leaderboard at tbench.ai. Estimates noted where official scores weren’t published. Benchmark methodology varies by provider – direct comparisons should be treated as directional, not definitive.
The honest take: Composer 2 beats Claude Opus 4.6 on Terminal-Bench 2.0 – a meaningful result on an agent-focused, terminal-use benchmark. But GPT-5.4 still leads on Terminal-Bench 2.0 (75.1 vs 61.7), and Claude Opus 4.5 dominates SWE-bench Verified (80.9%). Composer 2 is competitive at frontier level. It’s not the outright leader on every measure – yet.
Pricing: Where Composer 2 Gets Interesting
| Plan | Price | What You Get |
|---|---|---|
| Hobby | Free | Limited agent requests, limited Tab completions |
| Pro | $20/mo | Extended agent limits, frontier models, MCPs, cloud agents, Composer 2 usage pool |
| Pro+ | $60/mo | Everything in Pro + 3x usage on all OpenAI, Claude, Gemini models |
| Ultra | $200/mo | Everything in Pro + 20x usage, priority feature access |
| Teams | $40/user/mo | Shared commands, centralized billing, analytics, SAML/SSO, privacy mode |
| Enterprise | Custom | Pooled usage, SCIM, audit logs, granular model controls |
Composer 2 Token Pricing (API-style, within Cursor)
| Variant | Input (per M tokens) | Output (per M tokens) | Default? |
|---|---|---|---|
| Composer 2 Standard | $0.50 | $2.50 | No |
| Composer 2 Fast | $1.50 | $7.50 | Yes (default) |
Quick Competitor Price Comparison
| Tool | Entry Price | Professional |
|---|---|---|
| Cursor (Composer 2) | Free | $20/mo (Pro) |
| GitHub Copilot | Free (limited) | $10/mo (Individual) |
| Claude Code (Anthropic) | API-based | ~$15-$30+/mo typical usage |
| Windsurf | Free | $15/mo (Pro) |
Pricing current as of March 19, 2026. Token overages apply beyond usage pool limits.
Key Features
1. Long-Horizon Agentic Execution
Composer 2’s headline capability is its ability to execute hundreds of sequential actions to complete a single complex task – refactoring an entire module, debugging a multi-file issue, or shipping a feature end-to-end. This comes from training on long-horizon coding trajectories with self-summarization built into the training process, meaning the model learned to maintain context across tasks far longer than its context window through its own compression mechanism. The limitation: for extremely open-ended tasks with no clear success criteria, the model can still go off-rail and confidently do the wrong thing at scale. Human checkpoint review remains essential.
2. Deep Cursor Tool Integration
Unlike using Claude or GPT through the Cursor interface, Composer 2 is natively integrated with Cursor’s full tool stack: semantic code search, file and folder navigation, file reads and edits, shell command execution, browser control, and web access. It understands Cursor’s conventions, knows how to use the workspace’s rules files, and doesn’t burn tokens on tool-use scaffolding the way third-party models do. The limitation: it only runs inside Cursor – there’s no API or CLI access to Composer 2 outside the product.
3. Two Speed Tiers
Cursor ships two versions: the standard Composer 2 at $0.50/M input tokens (slower, cheaper) and Composer 2 Fast at $1.50/M input tokens, which Cursor makes the default. They claim the fast variant is cost-competitive versus other fast frontier models – and based on their internal traffic data snapshot from March 18, that appears accurate. For individual Pro plan users, Composer 2 usage draws from a dedicated standalone usage pool, so you’re not competing with your Claude or GPT quotas. The limitation: the fast variant’s $7.50/M output token rate adds up quickly on verbose generation tasks.
4. Continued Pretraining Foundation
The quality jump from Composer 1.5 to Composer 2 didn’t come just from more RL training – it came from Cursor completing its first continued pretraining run. They started from a third-party base model and pretrained it further before applying reinforcement learning. This gives the RL phase a higher-quality foundation to work from, which is why the benchmark improvements are so large compared to incremental updates. The limitation: Cursor hasn’t disclosed the base model, so it’s difficult to independently verify the pretraining claims.
5. Self-Driving Codebase Research Mode
Announced in preview on February 5, 2026 (just weeks before Composer 2), Cursor’s multi-agent research harness allows Composer 2 to be deployed in a “self-driving” configuration where agents run autonomously on a codebase – not just responding to prompts but exploring, planning, and modifying code in loops. Composer 2 is the model that powers this research preview. The limitation: it’s still an early alpha, and autonomous multi-agent loops without guardrails create real risk of runaway edits in production codebases.
6. Automations and Trigger-Based Agents
Cursor’s March 5, 2026 product update added Automations – agents that run based on triggers and instructions you define. Composer 2 integrates with this system, meaning you can configure it to run automatically on events (e.g., a new PR, a failing test suite) rather than only on explicit prompts. Combined with the 30+ new marketplace plugins launched March 11, this turns Composer 2 into an infrastructure component, not just a chat interface. The limitation: trigger-based autonomous agents require careful permission scoping – one misconfigured automation can cause cascading file changes.
Who Is It For – And Who Should Look Elsewhere
Use Cursor Composer 2 if you:
- Are already a Cursor user – the integration advantage is real, and upgrading to Composer 2 is a no-brainer on Pro
- Work on large, complex codebases where multi-file, multi-step tasks are the norm (refactors, feature ships, debugging sessions)
- Want frontier model performance without paying Claude API rates for every agentic task
- Are part of a team that needs centralized billing, usage analytics, and privacy controls (Teams plan covers this cleanly)
- Are building at a company that takes security seriously – Cursor is SOC 2 certified and supports privacy mode
Look elsewhere if you:
- Need raw SWE-bench accuracy as your primary metric – Claude Opus 4.5 still leads at 80.9% verified, GPT-5.4 leads on Terminal-Bench at 75.1
- Work primarily in JetBrains or VS Code and aren’t open to switching editors (JetBrains support exists via Agent Client Protocol, but it’s not native)
- Need a standalone model API to integrate into your own tooling or CI/CD pipeline – Composer 2 has no external API at launch
- Are a solo developer who barely uses agentic features – GitHub Copilot at $10/mo is hard to beat for basic autocomplete needs
Comparison: Cursor Composer 2 vs Claude Code vs OpenAI Codex vs GitHub Copilot
Full breakdown of the four main AI coding tools competing for your workflow in 2026. (See our full Cursor vs Windsurf vs Copilot comparison for a deeper multi-tool breakdown.)
| Feature | Cursor Composer 2 | Claude Code | OpenAI Codex (GPT-5.4) | GitHub Copilot |
|---|---|---|---|---|
| Entry Price | Free / $20/mo Pro | API-based (~$15-30+/mo) | API-based (usage-based) | Free / $10/mo Individual |
| Model Type | Proprietary (Anysphere) | Claude Opus 4.5/4.6 (Anthropic) | GPT-5.4 (OpenAI) | GPT-4o / Claude (varies) |
| Terminal-Bench 2.0 | 61.7 | 58.0 (Opus 4.6) | 75.1 | N/A |
| SWE-bench Verified | N/R | 80.9% (Opus 4.5) | 77.2% (GPT-5.4) | ~55% (est.) |
| Agentic / Autonomous | ? Native (100s of actions) | ? Yes (CLI-native) | ? Yes (Codex CLI) | ?? Limited (Copilot Workspace) |
| IDE Integration | Cursor (deep native) | Terminal / any editor | Terminal / VS Code | VS Code, JetBrains, etc. |
| Context Window | 200K tokens | 200K tokens | 128K tokens | 64K tokens |
| Multi-model Support | ? OpenAI, Claude, Gemini + Composer | ? Claude only | ? OpenAI only | ?? Mixed (limited choice) |
| Standalone API | ? No (IDE only) | ? Yes | ? Yes | ? No (IDE only) |
| Enterprise Ready | ? SOC 2, SAML, SCIM | ? Anthropic Enterprise | ? OpenAI Enterprise | ? GitHub Enterprise |
| Best For | Cursor-native agentic dev | SWE-bench accuracy, CLI | Speed + raw benchmark | Autocomplete, GitHub integration |
Controversy: The Uncomfortable Parts Cursor Won’t Put in the Blog Post
OpenAI Invested in Cursor – Now Cursor Competes With OpenAI
The OpenAI Startup Fund led Cursor’s $8 million seed round in 2023. OpenAI employees use Cursor internally. But Composer 2 is explicitly designed to compete with OpenAI Codex. To make it stranger: Bloomberg reported in 2026 that OpenAI previously tried to acquire Anysphere, and was turned down. The same investors (Thrive Capital) have funded both OpenAI and Anysphere. This is the Silicon Valley ouroboros at full speed – yesterday’s patron is today’s competitor, and the money is circular. For users, this conflict of interest is largely irrelevant to day-to-day use. But it’s worth knowing when you’re betting your workflow on a platform.
The Vibe Coding Problem – And Composer 2 Is Part of It
Andrej Karpathy coined “vibe coding” in early 2025 – building software by prompting AI without deeply reviewing what it generates. By February 2026, 92% of US developers were using AI coding tools daily and 46% of all new code was AI-generated. The criticism is real and has teeth: studies show 45% of AI-generated code samples contain security vulnerabilities, a 72% security failure rate for AI-generated Java specifically, and a measurable decline in junior developer skill acquisition when AI is used without intentional oversight. Composer 2, with its ability to execute hundreds of actions autonomously, accelerates this dynamic. The model will work hard and confidently in the wrong direction if you give it a bad prompt. Cursor’s “Best practices for coding with agents” blog post (Jan 9, 2026) actually does a good job addressing this – but most users won’t read it. The tool is powerful enough to cause real technical debt at scale.
The $50B Valuation Is a Bet You’re Making Too
If Cursor closes its next round at $50 billion, it will be one of the most highly valued private software companies in history – and it’s a four-year-old startup. The underlying business case is strong (1M DAUs, 50,000 businesses, growing ARR). But at $50B, there’s real risk baked in. If OpenAI ships Codex as a first-class IDE product, or Anthropic deepens its native Claude Code IDE integrations, or Microsoft leans into Copilot for VS Code aggressively, Cursor’s competitive moat looks thinner. The product is excellent today. Basing critical engineering infrastructure on a $50B-valued startup warrants some contingency planning.
Benchmark Wars Have Known Limits
Cursor uses CursorBench, an internal evaluation framework, alongside Terminal-Bench 2.0 and SWE-bench Multilingual. These are legitimate benchmarks – but every company puts its best foot forward when self-reporting. Cursor’s benchmark methodology notes (in the footnotes of their official launch post) that they compute Terminal-Bench 2.0 scores using the Harbor framework with 5 iterations per model-agent pair. Anthropic and OpenAI use different scaffolding for the same benchmark. This means the comparison numbers in the Cursor blog post aren’t apples-to-apples – a point Cursor acknowledges but understandably doesn’t highlight in the headline.
Pros and Cons
Pros
- Genuine frontier performance – 73.7 on SWE-bench Multilingual and 61.7 on Terminal-Bench 2.0 puts it in real competition with Claude and OpenAI, not just in marketing materials
- Competitive pricing – $0.50/M input tokens standard, and Cursor claims the fast variant undercuts other fast frontier models on per-token cost
- Deep IDE integration – Tool-use efficiency gains over third-party models are real; Composer 2 doesn’t waste tokens on scaffolding overhead
- Long-horizon capability – The model was trained to not lose its goal across hundreds of actions, which is the specific capability most coding agents fail at
- Multi-model flexibility – You still get Claude, GPT-4o, Gemini alongside Composer 2 on the same plan; you don’t have to choose
- Active product velocity – JetBrains support, automations, marketplace plugins, Glass alpha, Bugbot, self-driving codebase research – Cursor shipped a lot in Q1 2026
- Enterprise compliance – SOC 2, SAML/OIDC, SCIM, audit logs, privacy mode – the enterprise checklist is complete
Cons
- No standalone API – Composer 2 is locked to the Cursor IDE; you can’t use it in your own tooling, CI pipeline, or outside Cursor
- Trails GPT-5.4 on Terminal-Bench 2.0 – 61.7 vs 75.1 is a meaningful gap on the agentic benchmark that matters most for autonomous task execution
- SWE-bench Verified not published – Claude Opus 4.5 leads at 80.9%; Cursor chose not to report this metric directly, which is a conspicuous omission
- IDE switch cost – VS Code muscle memory and extensions don’t transfer instantly; onboarding to Cursor takes real time
- Autonomous agent risk – Hundreds of actions in a loop without guardrails can do significant damage; requires discipline and checkpointing to use safely
Getting Started With Cursor Composer 2
You’re already a Cursor user and want to activate Composer 2, or you’re starting from scratch. Here’s the fastest path to actually using it well:
- Download and install Cursor at cursor.com/download. It’s built on VS Code, so your extensions and settings import. If you’re already on Cursor, just update to the latest version.
- Subscribe to Pro or higher. Go to cursor.com/pricing and upgrade. Pro at $20/mo includes Composer 2 access within the standalone usage pool. If you’re heavy on agentic tasks, Pro+ ($60) gives you 3x the model usage across all providers.
- Open the Agent panel and select Composer 2. In the sidebar, open the Chat/Agent panel. Click the model dropdown and select Cursor Composer 2 (Fast is the default). For heavy bulk tasks, switch to Standard to save on per-token cost.
- Set your project rules first. Before running Composer 2 on anything real, create a
.cursor/rulesfile with your project conventions, stack preferences, and any security requirements. Composer 2 reads these natively – without them, it’ll make reasonable guesses that may not match your actual standards. - Start with a scoped, verifiable task. Don’t ask Composer 2 to “refactor the entire codebase” in one shot. Start with: “Refactor the user authentication module to use the repository pattern” – a bounded, testable task. Review the action log step-by-step on your first session. Once you understand how it operates and where it needs supervision, you can open the scope.
If you’re evaluating the full AI coding landscape, also read our GPT-5.3 Codex Spark review – OpenAI’s ultra-fast coding model – and our roundup of the best AI tools in 2026 to see where Composer 2 fits the bigger picture.
Final Verdict
Cursor Composer 2 is the real deal – not a PR play, not a benchmark-chasing stunt. For the first time, Cursor doesn’t need to apologize for using other companies’ models. They built one. And while it doesn’t lead every benchmark (GPT-5.4 still wins on Terminal-Bench 2.0, Claude Opus 4.5 leads SWE-bench Verified), it performs at genuine frontier level while integrating more deeply into the Cursor workflow than any third-party model ever could.
Buy it today if: You’re already on Cursor Pro, or you’ve been considering it. The model upgrade alone justifies the $20/mo. Enterprise teams on GitHub Copilot should run a Cursor Teams trial – the agentic capability gap is significant.
Wait if: You need maximum raw benchmark accuracy above all else (stick with Claude Opus 4.5 via Claude Code), or if you’re deeply invested in a non-Cursor IDE and don’t have the bandwidth to switch right now. JetBrains integration exists but it’s not native Cursor – the full Composer 2 experience is on Cursor’s own editor.
Cursor just became more than an IDE company. That changes the calculus for everything that comes next. Rating: 4.3/5.
Frequently Asked Questions
What is Cursor Composer 2?
Cursor Composer 2 is Anysphere’s first in-house frontier AI model, purpose-built for agentic software development inside the Cursor IDE. Announced March 19, 2026, it can autonomously carry out complex, multi-step coding tasks requiring hundreds of actions, trained specifically on long-horizon coding trajectories through reinforcement learning.
How does Cursor Composer 2 benchmark against Claude Code and OpenAI Codex?
On Terminal-Bench 2.0, Composer 2 scores 61.7 – beating Claude Opus 4.6 (58.0) but trailing GPT-5.4 (75.1). On SWE-bench Multilingual, it scores 73.7. Claude Opus 4.5 leads SWE-bench Verified at 80.9%, while OpenAI’s GPT-5.4 scores 77.2% on SWE-bench general. Each model leads on different benchmarks.
How much does Cursor Composer 2 cost?
Composer 2 is priced at $0.50 per million input tokens and $2.50 per million output tokens (standard). The faster default variant costs $1.50/M input and $7.50/M output tokens. For individual users, Cursor Pro ($20/mo) includes generous Composer usage within its usage pool.
Is Cursor Composer 2 available to free users?
Composer 2 access is part of paid plans. Free Hobby users have very limited agent requests. A Cursor Pro plan at $20/mo gives you access to Composer 2 within its standalone usage pool – that’s the recommended entry point.
How is Cursor Composer 2 different from previous Cursor models?
Composer 2 is Cursor’s first in-house model – a significant shift from routing through third-party APIs. Performance improved dramatically: CursorBench jumped from 44.2 (Composer 1.5) to 61.3, Terminal-Bench 2.0 from 47.9 to 61.7, and SWE-bench Multilingual from 65.9 to 73.7. The improvement comes from Cursor’s first continued pretraining run providing a stronger base before reinforcement learning.
Does Cursor Composer 2 only work inside Cursor?
Yes – Composer 2 is deeply integrated with Cursor’s tool stack and there is no standalone API at launch. It’s also available in Cursor Glass (early alpha). JetBrains users can access Cursor agents via Agent Client Protocol, but the native experience is Cursor’s own editor.
Who should use Cursor Composer 2 vs Claude Code?
Use Composer 2 if you’re in the Cursor ecosystem and want deep IDE integration with cost-efficient frontier performance. Use Claude Code if you need the highest raw SWE-bench accuracy (Opus 4.5 at 80.9% verified), prefer CLI-native workflows, or need a standalone API outside any specific IDE.
Is there a conflict of interest with OpenAI investing in Cursor?
Yes – the OpenAI Startup Fund led Cursor’s 2023 seed round, yet Composer 2 now directly competes with OpenAI Codex. Reports indicate OpenAI previously tried to acquire Anysphere and was turned down. For users, this creates no practical problem today, but it’s worth understanding when evaluating Cursor as long-term infrastructure.
What is Cursor’s current valuation?
As of March 2026, Cursor (Anysphere) is reportedly in early discussions for a new funding round targeting a ~$50 billion valuation, up significantly from its November 2025 $29.3 billion valuation. The company serves 1M+ daily users and 50,000+ businesses including Stripe, Figma, Salesforce (90%+ of 20,000 developers), and NVIDIA.
How do I get started with Cursor Composer 2?
Download Cursor at cursor.com, subscribe to Pro ($20/mo), open the Agent panel, and select Composer 2 from the model dropdown. Set up a .cursor/rules file with your project conventions before running any agent tasks. Start with a bounded, verifiable task – not “refactor everything” – and review the action log step by step on your first session.



