OpenAI just bought the tool that enterprises use to test whether their AI applications can be hacked, jailbroken, or manipulated into leaking data. On March 10, 2026, OpenAI announced it is acquiring Promptfoo — an open-source LLM evaluation and red-teaming platform already running inside 127 Fortune 500 companies and trusted by over 300,000 developers worldwide. The acquisition signals a direct pivot: AI security testing is no longer optional, and OpenAI is moving to own that infrastructure layer before enterprises demand it from someone else.
If you’ve never heard of Promptfoo, that’s about to change. The tool has 11,606 GitHub stars, a thriving Discord community, and a track record of finding real vulnerabilities — prompt injections, jailbreaks, PII leaks, and tool misuse — in production AI systems. Here’s our full review plus what the OpenAI acquisition actually means for anyone using it today.
Rating: 8.7/10 ⭐⭐⭐⭐⭐
What Is Promptfoo?
Promptfoo is an open-source CLI and library for evaluating, testing, and red-teaming LLM applications. It was founded by Ian Webster and Michael D’Angelo and has been available on GitHub under the MIT license since 2023. The tool lets developers run automated tests against any LLM-powered app — checking prompt quality, comparing model outputs, and scanning for security vulnerabilities across 60+ supported providers including OpenAI, Anthropic, Google, Meta, Mistral, Ollama, and more.
The one-line differentiator: Promptfoo treats your AI application like a penetration tester would — systematically probing it for weaknesses before attackers do, then giving you a remediation report you can actually act on. Visit the official site at promptfoo.dev. For comparison, our Claude Code review covers how Anthropic’s coding agent approaches the AI safety side of the equation.
The Acquisition Story: Why OpenAI Wants This
The OpenAI announcement is blunt about the rationale. Direct quote from Srinivas Narayanan, CTO of B2B Applications at OpenAI: “Promptfoo brings deep engineering expertise in evaluating, securing, and testing AI systems at enterprise scale. Their work helps businesses deploy secure and reliable AI applications, and we’re excited to bring these capabilities directly into Frontier.”
Frontier is OpenAI’s enterprise platform for deploying AI coworkers in real workflows. As enterprises move from “AI experiments” to “AI handling actual business processes,” the security gap becomes a liability. OpenAI’s play here is clear: bundle security testing natively into Frontier so enterprise procurement teams stop asking “how do we test this?” and start asking “when do we expand?”
From Ian Webster, Promptfoo co-founder and CEO: “We started Promptfoo because developers needed a practical way to secure AI systems. As AI agents become more connected to real data and systems, securing and validating them is more challenging and important than ever. Joining OpenAI lets us accelerate this work.”
The deal’s closing is still subject to customary conditions as of the announcement date, but the strategic direction is clear. Promptfoo’s technology will be integrated into OpenAI Frontier. The open-source project will continue — OpenAI explicitly committed to that. What changes is who’s funding the roadmap and where the enterprise-grade features live.
Benchmark Performance: What Promptfoo Actually Catches
Promptfoo doesn’t publish a single accuracy score — it’s not that kind of tool. What it measures is vulnerability coverage: how many attack types it can probe, how accurately it grades outputs, and how well it integrates into real CI/CD workflows. Based on community benchmarks, documented case studies, and the tool’s stated capabilities:
| Metric | Promptfoo | Giskard | LangSmith | PromptLayer |
|---|---|---|---|---|
| Vulnerability types covered | 50+ | ~30 | Eval-focused, limited red team | None (logging only) |
| Red teaming automation | ✅ Full (custom attack generation) | ✅ Partial | ❌ Manual only | ❌ None |
| CI/CD integration | ✅ GitHub, GitLab, Jenkins, more | ✅ GitHub Actions | ✅ via SDK | ⚠️ Limited |
| Model providers supported | 60+ | ~20 | ~15 (LangChain focus) | ~10 |
| Local/private evals | ✅ 100% local option | ✅ | ❌ Cloud-dependent | ❌ Cloud-dependent |
| Fortune 500 adoption | 127 companies | Not publicly stated | LangChain ecosystem | Individual/startup |
| Community / GitHub stars | 11,606 ⭐ | ~4,200 ⭐ | ~7,000 ⭐ (LangChain) | ~2,100 ⭐ |
Source: GitHub, official documentation, publicly stated adoption figures as of March 2026. LangSmith stars counted via parent LangChain repo.
Pricing: What Promptfoo Costs (Still Free Post-Acquisition)
As of the acquisition announcement, Promptfoo’s pricing structure remains unchanged. OpenAI has not announced any plans to put the core tool behind a paywall. Here’s what exists today:
| Tier | Price | Key Features | Red Team Probes | Best For |
|---|---|---|---|---|
| Community (Open Source) | Free Forever | All LLM eval features, all providers, vulnerability scanning, CI/CD integration, local/self-hosted | 10,000/month | Individual devs, small teams, open-source projects |
| Enterprise | Custom (contact sales) | All Community + team sharing, continuous monitoring, compliance dashboard, SSO, custom attack profiles, Promptfoo API, managed cloud | Unlimited (custom) | Enterprise security teams, large orgs with multiple AI products |
| Enterprise On-Premise | Custom (contact sales) | All Enterprise + full infrastructure control, complete data isolation, dedicated deployment engineer | Unlimited | Financial services, healthcare, defense — orgs with strict data sovereignty requirements |
Competitor pricing comparison:
| Tool | Free Tier | Paid Starts At | Notes |
|---|---|---|---|
| Promptfoo | ✅ Full-featured (10k probes/mo) | Custom enterprise | Open source, MIT license |
| Giskard | ✅ Open-source tier | Custom enterprise | Python-focused, ML bias testing |
| LangSmith | ✅ Developer tier (limited) | ~$39/mo (Plus) | LangChain ecosystem lock-in |
| PromptLayer | ✅ Limited logging | ~$25/mo | Observability/logging focus, not security |
Key Features
1. Automated Red Teaming with 50+ Vulnerability Types
Promptfoo’s red team engine generates custom attack payloads tailored to your specific application — not generic tests pulled from a static library. It simulates prompt injections (direct and indirect), jailbreak attempts calibrated to your system’s guardrails, data and PII leak scenarios, business rule violations, insecure tool use in agentic systems, and toxic content generation. The threat intelligence feeding those attacks comes from a community of 300,000+ users, meaning new attack vectors surface fast. The limitation: the 10,000 probe/month cap on the free tier can be exhausted quickly on complex, multi-agent systems — you’ll need Enterprise for continuous production monitoring.
2. Multi-Model Evaluation and Side-by-Side Comparison
The eval engine runs your prompts simultaneously across 60+ providers — GPT-5, Claude, Gemini, Llama, Mistral, local Ollama models, or your own custom API — and renders results in a clean web UI table. You can test cost, latency, and output quality in a single run. This makes model selection decisions data-driven rather than vibes-based. The limitation: your own API keys fund the inference, so a comprehensive multi-model eval across thousands of test cases gets expensive fast depending on which models you’re comparing.
3. CI/CD Integration with PR-Level Security Scanning
Promptfoo integrates with GitHub, GitLab, Jenkins, and other CI/CD pipelines to run security scans on every pull request that touches LLM-related code. Security findings surface directly in PRs with actionable remediation steps — keeping security in the developer workflow instead of a separate audit process. This is the feature that drove Fortune 500 adoption; it fits how enterprise security teams already work. For a real-world example of AI finding actual production security bugs, see our OpenAI Codex Security review. The limitation: initial setup requires YAML configuration and understanding of assertion types, which has a real learning curve for teams without AI engineering experience.
4. 100% Local Execution / Data Privacy
Promptfoo can run entirely on-premise or locally. Your prompts, test cases, and model outputs never leave your machine unless you choose cloud deployment. For healthcare, legal, financial services — any domain with strict data handling requirements — this is a significant differentiator versus SaaS-only alternatives. The limitation: the community tier’s local execution is powerful but lacks centralized dashboards and team collaboration features; you’re sharing results via manual export until you upgrade to Enterprise.
5. Declarative YAML Configuration
Everything in Promptfoo is defined in a simple YAML config file — prompts, providers, test cases, assertions. No custom SDK, no vendor-specific classes to import. This means it works with any LLM application built in any language, and your test configs are version-controlled like any other code artifact. The limitation: YAML configuration can get verbose fast for complex test suites, and the assertion system (LLM-rubric, JavaScript functions, regex, etc.) takes time to master for anything beyond basic output validation.
Who Is Promptfoo For (And Who Should Look Elsewhere)
Use Promptfoo if you:
- Are building LLM-powered applications and want to catch security vulnerabilities before deployment, not after
- Need to compare multiple AI models on the same task with objective metrics — cost, latency, output quality
- Work in a regulated industry (finance, healthcare, legal) where AI data privacy is non-negotiable and local execution matters
- Already run CI/CD pipelines and want AI security testing to fit naturally into your existing PR review process
- Are an independent developer or small team that needs enterprise-grade LLM testing capabilities at zero cost (if you’re still assembling your dev stack, see our comparison of Cursor vs Windsurf vs GitHub Copilot)
Look elsewhere if you:
- Want a no-code, GUI-only solution — Promptfoo is developer-first; CLI and YAML are unavoidable
- Only need LLM observability and logging (PromptLayer or LangSmith handle this better with less setup — and if you’re still picking your core AI tool, our best AI chatbots roundup covers the leading options)
- Are deeply committed to the LangChain ecosystem and want native integration without configuration overhead
- Expect a fully managed enterprise security product out-of-the-box — the Community tier is excellent but Enterprise pricing is opaque (custom quote only)
4-Way Comparison: Promptfoo vs. Top Competitors
| Feature | Promptfoo | LangSmith | Giskard | PromptLayer |
|---|---|---|---|---|
| Primary Focus | LLM eval + red teaming + security | LLM tracing + eval (LangChain) | LLM testing + bias/fairness | Prompt logging + versioning |
| Red Teaming | ✅ Full — 50+ vuln types | ❌ Manual testing only | ✅ Partial | ❌ None |
| Open Source | ✅ MIT License | ⚠️ LangChain is OSS, LangSmith is SaaS | ✅ Apache 2.0 | ❌ Closed source |
| Local / On-Premise | ✅ Full local execution | ❌ Cloud-required | ✅ | ❌ |
| Free Tier Value | ⭐⭐⭐⭐⭐ — All features, 10k probes/mo | ⭐⭐⭐ — Limited traces | ⭐⭐⭐⭐ — Full OSS | ⭐⭐ — Basic logging only |
| Providers Supported | 60+ | ~15 (LangChain models) | ~20 | ~10 |
| CI/CD Integration | ✅ GitHub, GitLab, Jenkins, more | ✅ via LangChain SDK | ✅ GitHub Actions | ⚠️ API-only, limited |
| Acquisition Status | 🔴 Being acquired by OpenAI | LangChain (independent) | Independent | Independent |
| Best For | Security-focused AI teams, enterprise CI/CD | LangChain developers, tracing/observability | Data scientists, bias/fairness testing | Prompt versioning, lightweight logging |
| Pricing | Free / Custom Enterprise | Free / $39+/mo | Free / Custom Enterprise | Free / $25+/mo |
Controversy: What Happens to Open Source After an OpenAI Acquisition?
This is the question every Promptfoo user is asking right now, and it deserves a real answer instead of PR spin.
OpenAI’s track record on open source is complicated. The company was founded with a mission to develop AI “for the benefit of humanity” and release research openly — then spent the next several years systematically closing off access to its most capable models. GPT-4 is not open. GPT-4.5 is not open. Even GPT-5.4 — OpenAI’s current flagship — ships as a closed API. OpenAI’s competitive moat is proprietary models, not shared technology. Their open-source commitments have repeatedly eroded when commercial interests came into conflict with them.
The official commitment is present but vague. The acquisition announcement states: “Together, we will continue building the open-source project while also advancing the integrated enterprise capabilities within Frontier.” That’s a promise to keep the open-source project running — not a promise to keep it independent, feature-complete, or on the same development trajectory as the enterprise version.
The historical pattern with acquisitions: When larger companies acquire open-source projects, a common outcome is “open core” bifurcation — the free tier stays functional but new security capabilities, the most valuable features, get bundled into the paid enterprise product. Promptfoo’s current Community tier is already quite capable, which makes this less immediately alarming. But the question is where the next 18 months of red-teaming research goes: into the open-source repo, or into OpenAI Frontier exclusives.
The practical near-term reality: The acquisition hasn’t closed yet. As of March 10, 2026, Promptfoo remains independent, the GitHub repo is active, and the Community tier works exactly as it did yesterday. If you’re evaluating the tool today, the open-source version is genuinely excellent. The uncertainty is about the 2027 version, not the one you can install right now.
The legitimate concern for enterprises not using OpenAI Frontier: If you’re an Anthropic shop, a Google Cloud shop, or a company running open-source models on-premise, you have every reason to monitor whether Promptfoo’s post-acquisition roadmap remains provider-agnostic or starts prioritizing OpenAI’s own infrastructure. The tool’s 60+ provider support is core to its value proposition — that neutrality is now at risk.
Pros and Cons
Pros
- Genuinely free and complete: The Community tier isn’t a crippled demo — it includes all evaluation features, all 60+ providers, vulnerability scanning, and CI/CD integration. Most teams won’t need to upgrade.
- Automation that actually scales: Generating thousands of contextual attack probes automatically, without manual scenario writing, is what separates Promptfoo from “just run some test prompts.”
- 100% local execution: Your test data never hits a third-party server unless you choose cloud deployment. Huge for regulated industries.
- Provider-agnostic: Works with GPT-5, Claude Opus 4, Gemini 3 Pro, Llama, Mistral, custom APIs — whatever your stack is today or might be tomorrow.
- Proven at scale: 127 Fortune 500 companies isn’t marketing fluff — this tool has been production-tested in the environments where failure has real consequences.
- Developer-native workflow: YAML config, CLI interface, PR-level findings, version-controlled test suites. Fits how engineering teams already work.
- Real-time threat intelligence: 300,000-user community means new attack vectors get incorporated into the testing suite faster than manual threat research could manage.
Cons
- Acquisition uncertainty: OpenAI’s open-source track record gives legitimate reason to watch whether the Community tier remains fully-featured over the next 12-18 months.
- CLI-first, no-code users need not apply: There’s a real setup curve. YAML configs, environment variables, assertion types — this is for developers, not business analysts.
- 10k probe/month cap is real: Complex agentic applications with multiple attack surfaces will hit this ceiling on the free tier and need to negotiate Enterprise pricing (which is opaque — no published rates).
- Enterprise pricing opacity: “Custom pricing” with no public rates is genuinely frustrating. You can’t budget for Enterprise without a sales call.
- OpenAI provider bias risk: Post-acquisition, there’s legitimate uncertainty about whether future features will favor OpenAI’s own APIs over competitors.
Getting Started: Install and Run Promptfoo in 5 Steps
Promptfoo is available via npm, Homebrew, and pip. Here’s the fastest path from zero to your first red team scan:
Step 1: Install Promptfoo
No installation required for quick testing — run via npx:
npx promptfoo@latest init --example getting-started
Or install globally:
npm install -g promptfoo
Mac users: brew install promptfoo
Step 2: Set Your API Key
Most LLM providers need an API key. For OpenAI:
export OPENAI_API_KEY=sk-your-key-here
Promptfoo supports all major providers — swap in your Anthropic, Google, or other API key as needed. Keys are read from environment variables and never stored by the tool.
Step 3: Configure Your First Eval
Your promptfooconfig.yaml defines what you’re testing. A minimal example:
prompts:
- 'Translate the following to {{language}}: {{input}}'
providers:
- openai:gpt-5-mini
- anthropic:messages:claude-opus-4-6
tests:
- vars:
language: French
input: Hello world
assert:
- type: icontains
value: 'Bonjour'
Step 4: Run the Evaluation
cd getting-started
promptfoo eval
This runs every prompt against every provider with every test case and logs results to the terminal. Takes seconds to minutes depending on scope.
Step 5: View Results + Run Your First Red Team
promptfoo view
This opens a local web UI showing side-by-side model outputs, pass/fail status, and cost/latency metrics. When you’re ready to run a security scan:
npx promptfoo@latest redteam setup
This launches an interactive setup to configure your target application, select vulnerability types to probe, and generate your first red team config. Then run promptfoo redteam run to execute.
Final Verdict
Promptfoo is the best open-source tool for LLM security testing on the market — and it’s not particularly close. The combination of automated red teaming, 60+ provider support, 100% local execution, and CI/CD integration at zero cost is genuinely hard to compete with. The tool has been production-validated by 127 Fortune 500 companies, and the 300,000-developer community is a real signal, not marketing noise.
Should you use it today? Yes — unambiguously. The acquisition doesn’t change what the tool does right now, and the open-source version is available on GitHub regardless of what OpenAI does with the enterprise product. If you’re building anything with LLMs and you’re not testing for prompt injection and jailbreak vulnerabilities, you’re flying blind. (New to AI tools entirely? Our ChatGPT review covers what most teams are building on.)
Should you build your entire security stack around it long-term? Watch the next 12 months carefully. If OpenAI starts routing critical red-teaming capabilities into Frontier exclusives and the Community tier stagnates, you’ll want alternatives evaluated and ready. Giskard is the most credible open-source fallback. But that’s a 2027 problem, not a today problem.
The 8.7/10 reflects a tool that’s excellent at what it does, genuinely free, and proven at scale — with one legitimate asterisk: the acquisition creates uncertainty that deserves honest acknowledgment. If OpenAI keeps its open-source commitment, this tool gets more capable over time with OpenAI’s resources behind it. If they don’t, the community will fork it. Either way, the value is real today.
Frequently Asked Questions
What is Promptfoo and what does it do?
Promptfoo is an open-source CLI and library for evaluating and red-teaming LLM applications. It automates the process of testing AI applications for security vulnerabilities (prompt injections, jailbreaks, data leaks, tool misuse), comparing model outputs across multiple providers, and integrating AI security testing into CI/CD pipelines. It supports 60+ LLM providers and runs evals completely locally.
Why did OpenAI acquire Promptfoo?
OpenAI acquired Promptfoo to integrate its AI security testing and red-teaming capabilities directly into OpenAI Frontier, their enterprise platform for deploying AI coworkers. As enterprises deploy agentic AI systems in real workflows, evaluation and security compliance become critical requirements. OpenAI is bundling Promptfoo’s technology to make security testing a native part of Frontier rather than a third-party add-on.
Is Promptfoo still open source after the OpenAI acquisition?
Yes — as of the acquisition announcement on March 10, 2026, OpenAI explicitly committed to continuing the open-source Promptfoo project. The GitHub repository remains active under the MIT license. However, the acquisition hasn’t closed yet, and there is legitimate uncertainty about whether enterprise-grade features will remain part of the open-source build or migrate to OpenAI Frontier exclusives over time.
How much does Promptfoo cost?
Promptfoo’s Community tier is free forever and includes all LLM evaluation features, all 60+ provider integrations, vulnerability scanning, CI/CD integration, and up to 10,000 red-team probes per month. Enterprise pricing is custom (contact sales) and adds team collaboration, continuous monitoring, compliance dashboards, SSO, and unlimited probes. On-premise deployment is also available at custom pricing.
How do I install and run Promptfoo?
Install via npm: npm install -g promptfoo, or run without installing via npx promptfoo@latest init --example getting-started. Mac users can use Homebrew: brew install promptfoo. Set your LLM provider API key as an environment variable, configure your prompts and test cases in a YAML file, then run promptfoo eval to execute and promptfoo view to see results in a web UI.
What types of vulnerabilities does Promptfoo test for?
Promptfoo tests for 50+ LLM vulnerability types including direct and indirect prompt injections, jailbreaks tailored to your system’s guardrails, data and PII leaks, business rule violations, insecure tool use in agentic systems, and toxic content generation. Custom attack profiles can be created for organization-specific threat models. Attack patterns are updated automatically from threat intelligence generated by its 300,000+ user community.
How does Promptfoo compare to LangSmith?
Promptfoo and LangSmith serve overlapping but different needs. LangSmith is built primarily for LangChain developers and focuses on tracing, debugging, and evaluating LLM chains — it’s excellent for observability in LangChain applications but has limited red-teaming capabilities. Promptfoo is provider-agnostic (60+ providers), focused on security testing and vulnerability scanning, supports full local execution, and has a more complete free tier. If you’re not using LangChain, Promptfoo is generally the stronger choice for security-focused evaluation.
Can Promptfoo run completely locally without sending data to third parties?
Yes. Promptfoo can run 100% locally — your prompts, test cases, and model outputs never leave your machine. You provide your own LLM API keys and the eval engine runs on your infrastructure. This makes Promptfoo suitable for regulated industries (healthcare, finance, legal) where data sovereignty requirements prevent using cloud-only testing tools. The only external calls are to the LLM providers whose APIs you configure.
Is Promptfoo worth using if I’m not on OpenAI’s platform?
Yes — Promptfoo’s value is explicitly provider-agnostic. It works with Anthropic Claude, Google Gemini, Meta Llama, Mistral, local Ollama models, custom APIs, and over 60 total providers. The OpenAI acquisition raises legitimate questions about long-term roadmap neutrality, but as of today, the tool works identically across all providers. Teams running Anthropic or open-source model stacks benefit from Promptfoo just as much as OpenAI users.
What should I do if I’m concerned about the OpenAI acquisition affecting Promptfoo?
Use the current open-source version — it’s MIT licensed and fully functional. Watch the GitHub repository’s activity and commit history over the next 6-12 months; if the OSS version stops receiving meaningful updates while enterprise features expand, that’s your signal. Giskard (Apache 2.0, open source) is the closest open-source alternative to evaluate as a contingency. If OpenAI honors its commitment, you benefit from more resources behind Promptfoo’s development. The situation warrants monitoring, not immediate action.



