MetaGPT
Multi-agent framework that simulates a software company with role-based agents
MetaGPT is an open-source multi-agent framework that structures LLM-powered agents into a simulated software company. Each agent holds a specific role, Product Manager, Architect, Project Manager, Engineer, or QA, and the team collaborates using encoded standard operating procedures to turn a single natural-language requirement into a full set of software artifacts. The framework gained over 67,000 GitHub stars after a viral 2023 demo showing it produce PRDs, system designs, and working code from one sentence. In 2026 it remains a serious research framework and educational reference, though its practical use as a daily coding tool is limited. DeepWisdom, the maintainer, has since built the commercial Atoms platform (formerly MGX) on top of these ideas for users who want a hosted, product-ready experience.
When MetaGPT hit GitHub in August 2023, it spread fast. The demo was hard to ignore: one sentence in, and out came a product requirements document, a system design, an API spec, and working code, all structured as if a real software team had spent a week on it. The project crossed 10,000 stars in days and became one of the defining examples of what multi-agent AI systems could look like. Three years later, metagpt has over 67,000 GitHub stars, a research portfolio that includes papers at ICLR and NeurIPS, and a commercial successor product. It has also settled into a role that’s different from what the original demo implied. Understanding that gap is the honest starting point for any evaluation.
Quick verdict
MetaGPT is one of the most important demonstrations of multi-agent AI architecture, and it’s genuinely useful for a specific set of tasks. For day-to-day coding, it’s the wrong tool. The setup is involved, the generated code needs real review, and faster alternatives exist. For generating structured project documentation, learning how role-based agent pipelines work, or running structured data analysis, it earns its place.
What is MetaGPT, exactly?
MetaGPT is an open-source Python framework that structures multiple LLM-powered agents into a simulated software company. Each agent occupies a named role: Product Manager, Architect, Project Manager, Engineer, or QA Engineer. When you give the system a requirement, those agents collaborate through a defined standard operating procedure, passing structured outputs to each other, to produce a full set of software artifacts.
The core thesis, stated plainly in the repository, is that Code = SOP(Team). Standard operating procedures, the kinds of documented processes that real software organizations use, can be encoded into the behavior of LLM agents. Instead of a single model trying to be a full-stack developer all at once, you distribute the work across specialized roles with defined inputs and outputs at each stage.
That design choice produces something genuinely different from other coding tools. When you run MetaGPT on a requirement, you don’t just get code. You get a PRD that lists user stories and acceptance criteria. You get a system design document. You get data structures and API specifications. And then, based on those artifacts, you get code. The chain of documents is not decoration. Each artifact constrains the next one, which is exactly how structured software development is supposed to work.
DeepWisdom, the Chinese AI research company that maintains MetaGPT, published the foundational ideas in a paper in 2023 and has continued producing research around multi-agent coordination. The AFlow paper, which describes automated workflow optimization for agent pipelines, was accepted for oral presentation at ICLR 2025, where it ranked second in the LLM-based Agent category. The SPO and AOT papers followed in early 2025. This is a team that thinks seriously about the problems it’s working on, and the academic output is a meaningful part of what makes MetaGPT worth understanding even if you never deploy it in production.
The latest stable release of the open-source framework is v0.8.1, from April 2024. In early 2025, DeepWisdom launched MGX (MetaGPT X), a natural-language programming product built on these ideas, which has since evolved into the Atoms platform. The commercial product is where active feature development is happening. The open-source framework is maintained but no longer the company’s primary focus.
The features that defined the multi-agent moment
Role-based agents and SOPs
The role assignment is the thing MetaGPT is most known for, and it’s worth understanding how it actually works. Each role is not just a label. It carries a defined set of responsibilities, a set of actions it can take, and a set of outputs it produces. The Product Manager agent generates user stories and requirement specifications. The Architect produces system design and technology stack recommendations. The Project Manager breaks work into tasks. The Engineer writes code. The QA Engineer writes tests.
Each role communicates with the others through a message queue, passing structured documents rather than free-form chat. This is the SOP in action. The Product Manager doesn’t hand a vague requirement to the Architect; it hands a formatted PRD with defined fields. The Architect doesn’t hand a prose description to the Engineer; it hands an API spec.
This structure makes the system more predictable than a single-agent loop that can go off in any direction. It also makes it more constrained. The pipeline doesn’t adapt well to tasks that don’t fit the software-company template. If you’re trying to build something that doesn’t require a PRD and a system design, the role structure can feel like overhead rather than help.
Structured deliverables, not just code
This is the feature that actually distinguishes MetaGPT from nearly everything else in the coding agent space. Most AI coding tools produce code. MetaGPT produces a project.
A typical run on a non-trivial requirement generates a folder of artifacts: a markdown PRD with user stories, a system design document with class diagrams and sequence diagrams rendered in Mermaid, an API definition, and a set of source files with tests. The documents are formatted, cross-referenced, and structured in a way that reflects real software development practice.
For anyone who needs to generate project documentation alongside a prototype, this is genuinely valuable. Consulting teams, hackathon participants, and developers building MVPs that need to be presented to stakeholders have real uses for this output. You get a specification and an implementation in one run, which is more than any single-agent tool produces.
The quality of the code is more variable than the quality of the documentation. The PRD and design doc structures are consistent and readable. The code generation quality depends heavily on the model you’re using and the complexity of the requirement, and it frequently needs review before it’s usable.
Multi-model support
MetaGPT is not locked to OpenAI. Configuration happens through a YAML file at ~/.metagpt/config2.yaml, where you specify your model provider, API key, and any model-specific settings. The framework supports GPT-4o, Claude models through the Anthropic API, DeepSeek, Google Gemini, and local models through Ollama.
This matters for cost control. Running a full MetaGPT pipeline on a complex requirement is token-intensive. Each role produces structured output that the next role reads, and those structured documents consume context. Using a cheaper model like DeepSeek for the full pipeline can reduce costs substantially compared to running everything on GPT-4o, though output quality will vary by model.
In practice, models that follow structured output formats reliably tend to produce better results in MetaGPT. The framework expects each role to produce outputs in defined formats, and models that deviate from those formats cause downstream agents to receive malformed inputs. Claude and GPT-4o handle this better than some alternatives.
Sandboxed code execution
MetaGPT includes a code execution environment that lets agents run the code they generate and verify the results. This is used most prominently by the Data Interpreter agent, which can take a data analysis task, write Python code, execute it in a sandbox, observe the output, and iterate on the code if the results aren’t right.
The sandbox addresses one of the obvious failure modes of pure code generation: code that looks correct but fails when run. By giving the agent a feedback loop through execution, MetaGPT’s data analysis workflows are more reliable than static generation. The Data Interpreter can work through a multi-step analysis task, generating visualizations and statistical outputs, in a way that’s genuinely useful for structured data work.
For the main software-company pipeline, code execution plays a more limited role. The generated application code runs in the sandbox for basic validation, but the execution environment is not designed to replicate production conditions, so successful sandbox execution doesn’t guarantee a working application.
Recent additions and active development
While the open-source framework’s release cadence slowed through 2025, the research output from DeepWisdom continued. The AFlow work introduced automated optimization of agent workflow structures, essentially a system for discovering better multi-agent pipelines rather than hand-coding them. The SPO paper addressed prompt optimization across agent systems. These contributions are meaningful for anyone building in the multi-agent space.
The commercial Atoms platform, which won Product of the Week on ProductHunt in March 2025, represents where the team is putting its product energy. Atoms positions itself less as a developer tool and more as a way to turn business ideas into working products with minimal technical friction. It’s a commercial evolution of the MetaGPT thesis: structured multi-agent collaboration, productized for non-developers.
Pricing
MetaGPT the open-source framework is free. The MIT license covers commercial use, modification, and distribution with no restrictions. You install it with pip, configure your model in a YAML file, and pay only for the API calls you make to your chosen provider.
Those API costs are worth thinking about carefully. MetaGPT’s multi-agent pipeline is more token-intensive than single-agent tools because each role produces a full structured document that subsequent roles consume as context. Running a full pipeline on a moderately complex requirement with GPT-4o can cost between $0.50 and $3.00 per run depending on the size of the generated artifacts. Using DeepSeek or a local Ollama model cuts that dramatically, sometimes to near zero for local inference.
The commercial Atoms platform (atoms.dev) has its own pricing structure that is separate from the open-source framework. Specific tier details were not publicly listed at the time of writing, but the platform is a hosted product with sign-up required. If you want the hosted experience without managing infrastructure and API keys yourself, that’s the path DeepWisdom is steering toward.
For the open-source path, the cost model is the same as any bring-your-own-key tool: set up API access for the model you want, run the framework, and the charges appear on your model provider’s bill. There are no surprise costs, no seat licenses, and no rate limits beyond what your chosen model provider imposes.
Where MetaGPT wins and where it doesn’t
MetaGPT’s strongest use case is generating structured project documentation alongside a working prototype. No other open-source tool does this as well. If you need a PRD, a system design, and a first-draft implementation for a new feature or small project, MetaGPT produces all three in one run. The documents are formatted, internally consistent, and genuinely useful as a starting point for a real specification.
The Data Interpreter is a second genuine strength. For structured data analysis tasks where you have a clear analytical goal, the combination of code generation and sandboxed execution produces results that are more reliable than asking a general-purpose model to write analysis code. The feedback loop through execution catches obvious errors that pure generation misses.
Where MetaGPT falls short is everywhere the software-company framing doesn’t fit. Iterative development, where you’re building on an existing codebase and need to make targeted changes, is not what the system was designed for. The pipeline works best on greenfield tasks with a clear one-sentence requirement. On real codebases with existing patterns, dependencies, and constraints, the output quality drops and the overhead of the multi-role pipeline doesn’t pay off.
The setup friction is also a real limitation in 2026, when many competing tools are much easier to start using. The Python version constraint (3.9 to 3.11), the Node.js and pnpm dependencies, and the YAML configuration file are all manageable, but they add up to a first-run experience that’s rougher than tools designed with onboarding as a priority.
Who MetaGPT is built for
Researchers and engineers who want to understand multi-agent architectures will get genuine value from MetaGPT. The codebase is well-organized, the role and SOP abstractions are clearly implemented, and the academic papers give real theoretical grounding. If you’re building your own agent system and want to study a mature reference implementation, MetaGPT is worth running and reading.
Product teams doing early-stage prototyping, where the deliverable is a specification and a demo rather than production code, fit the tool’s output profile well. A product manager who wants to generate a PRD alongside a working prototype for a stakeholder presentation is in MetaGPT’s wheelhouse.
Data scientists and analysts who want an agent-driven analysis workflow can use the Data Interpreter productively without needing to engage with the full software-company pipeline. It’s a capable standalone module.
What MetaGPT is not well-suited for: engineers who need to make targeted changes to existing codebases, teams that need production-quality output without heavy post-processing, or anyone who wants a fast-feedback iterative development loop. For those cases, the best AI agent for coding options include faster, more focused tools.
MetaGPT vs the alternatives
MetaGPT vs AutoGPT
AutoGPT is a general-purpose autonomous agent designed to complete open-ended tasks using tools like web search, file management, and code execution. MetaGPT is a specialized framework for software development workflows with fixed roles and structured outputs.
AutoGPT is more flexible: give it a goal and it figures out how to accomplish it using whatever tools it has. MetaGPT is more structured: it applies a specific multi-role pipeline to a software development task. For tasks that fit the software-company template, MetaGPT’s structured approach produces more predictable outputs. For everything else, AutoGPT’s general-purpose architecture handles a much broader range of work. Neither is a reliable autonomous developer in the unsupervised sense, but they fail in different ways: AutoGPT can go off in unexpected directions on open-ended tasks, while MetaGPT produces well-structured but sometimes incorrect outputs within its pipeline.
MetaGPT vs OpenHands
OpenHands (formerly OpenDevin) is a more direct competitor for software engineering tasks. It’s an open-source autonomous coding agent that can browse the web, run a terminal, edit files, and work through multi-step engineering tasks with a real development environment.
For iterative coding on existing projects, OpenHands is the better tool. It’s designed for the kind of task where you want an agent to take a GitHub issue, reproduce it, write a fix, and run the tests. MetaGPT is better for the earlier phase: generating the initial specification and greenfield structure. The two tools are complementary more than competitive for teams that do both kinds of work.
MetaGPT vs GPT Engineer
GPT Engineer (now evolved into Lovable for the hosted product) has the most similar surface area to MetaGPT. Both take a natural-language description and produce a full codebase. GPT Engineer’s approach is simpler: it engages in a clarification dialogue and then generates code, without the structured role-based pipeline MetaGPT uses.
The practical difference is in output structure. GPT Engineer produces code faster with less ceremony. MetaGPT produces code plus a documented artifact trail. If the documents matter to you, MetaGPT’s overhead is worth it. If you want a code prototype as fast as possible, GPT Engineer’s simpler approach usually gets there quicker with less setup.
Getting started
Install MetaGPT with pip: pip install --upgrade metagpt. You’ll need Python 3.9 or later (but below 3.12), and Node.js with pnpm for the full feature set.
Configure your model in ~/.metagpt/config2.yaml. For OpenAI, set your API key and model name. For Claude, set the Anthropic API key and model. The documentation at docs.deepwisdom.ai covers configuration options for each supported provider.
Run your first task from the command line:
metagpt "Write a CLI tool that converts CSV files to JSON"
MetaGPT will create a workspace directory, run the agent pipeline, and deposit the generated artifacts there. Read through the PRD and system design before looking at the code. The documents give you a better sense of what the system understood about your requirement than the code alone does.
For data analysis tasks, the Data Interpreter is accessible separately and worth experimenting with on a dataset you know well. It lets you evaluate the quality of the generated analysis against ground truth before trusting it on new data.
The bottom line
MetaGPT earned its reputation. The 2023 demo was not hype for the sake of it; the system genuinely does something novel by producing structured project artifacts through a multi-role agent pipeline rather than just generating code. The research behind it is solid, the academic output has been recognized at top venues, and the framework remains one of the clearest reference implementations of how role-based agent systems can be structured.
The honest 2026 read is that it works best as an educational reference and a documentation-generation tool, not as a daily coding driver. For teams that want an autonomous coding agent to work on real tasks, OpenHands or GPT Engineer are more practical starting points. For engineers who want to understand how multi-agent systems are built and want a production-backed codebase to study, MetaGPT is still the best place to start. The stars on the repository are not an accident, and neither are the papers. It’s worth knowing what it does and where it fits.
Key features
- Assigns distinct roles to agents: Product Manager, Architect, Project Manager, Engineer, QA
- Generates structured deliverables including PRDs, design docs, API specs, and test suites
- Runs on any OpenAI-compatible model: GPT-4o, Claude, DeepSeek, Ollama, and more
- Sandboxed code execution environment for running and verifying generated code
- Data Interpreter agent for structured data analysis and visualization tasks
- Standard Operating Procedures (SOPs) encode best practices into the agent pipeline
Pros and cons
Pros
- + Genuinely novel architecture that encodes software-company SOPs into agent behavior
- + Produces structured deliverables (PRDs, design docs, tests) not just raw code
- + Free and open source under the MIT license with 67k+ GitHub stars
- + Broad model support including OpenAI, Claude, DeepSeek, and local models via Ollama
- + Active research output with papers at ICLR and NeurIPS explaining the methodology
- + Data Interpreter agent adds a capable data-science workflow on top of the core framework
Cons
- − High setup friction compared to modern coding agents, Python 3.9-3.11 required with Node.js and pnpm
- − Generated code quality is inconsistent and rarely production-ready without significant review
- − The software-company simulation is better for demos than for iterative daily development
- − Slower and more token-expensive than direct single-agent coding tools for most tasks
- − Last stable release (v0.8.1) was April 2024, development momentum has shifted to the commercial Atoms product
Who is MetaGPT for?
- Rapidly prototyping a full software specification from a single requirement statement
- Generating structured project documentation (PRDs, API specs, architecture diagrams) alongside code
- Research and experimentation with multi-agent coordination patterns and SOPs
- Data analysis workflows using the built-in Data Interpreter agent on structured datasets
Alternatives to MetaGPT
If MetaGPT isn't quite the right fit, the closest alternatives are autogpt , openhands , and gpt-engineer . See our full MetaGPT alternatives page for side-by-side comparisons.
Frequently Asked Questions
What is MetaGPT?
Is MetaGPT free?
How does MetaGPT compare to AutoGPT?
Can MetaGPT actually build a complete app?
What models does MetaGPT support?
Should I use MetaGPT in 2026?
Related agents
Aider
Git-aware AI pair programmer that runs in your terminal
Amazon Bedrock Agents
AWS-native AI agent platform built on Bedrock with Lambda actions and Guardrails
Amazon Q Developer
AWS-native AI coding assistant with deep cloud integration