Agentbrisk
codingautonomousmulti-agentopen-source Status: active

MetaGPT

Multi-agent framework that simulates a software company with role-based agents


MetaGPT is an open-source multi-agent framework that structures LLM-powered agents into a simulated software company. Each agent holds a specific role, Product Manager, Architect, Project Manager, Engineer, or QA, and the team collaborates using encoded standard operating procedures to turn a single natural-language requirement into a full set of software artifacts. The framework gained over 67,000 GitHub stars after a viral 2023 demo showing it produce PRDs, system designs, and working code from one sentence. In 2026 it remains a serious research framework and educational reference, though its practical use as a daily coding tool is limited. DeepWisdom, the maintainer, has since built the commercial Atoms platform (formerly MGX) on top of these ideas for users who want a hosted, product-ready experience.

When MetaGPT hit GitHub in August 2023, it spread fast. The demo was hard to ignore: one sentence in, and out came a product requirements document, a system design, an API spec, and working code, all structured as if a real software team had spent a week on it. The project crossed 10,000 stars in days and became one of the defining examples of what multi-agent AI systems could look like. Three years later, metagpt has over 67,000 GitHub stars, a research portfolio that includes papers at ICLR and NeurIPS, and a commercial successor product. It has also settled into a role that’s different from what the original demo implied. Understanding that gap is the honest starting point for any evaluation.

Quick verdict

MetaGPT is one of the most important demonstrations of multi-agent AI architecture, and it’s genuinely useful for a specific set of tasks. For day-to-day coding, it’s the wrong tool. The setup is involved, the generated code needs real review, and faster alternatives exist. For generating structured project documentation, learning how role-based agent pipelines work, or running structured data analysis, it earns its place.

What is MetaGPT, exactly?

MetaGPT is an open-source Python framework that structures multiple LLM-powered agents into a simulated software company. Each agent occupies a named role: Product Manager, Architect, Project Manager, Engineer, or QA Engineer. When you give the system a requirement, those agents collaborate through a defined standard operating procedure, passing structured outputs to each other, to produce a full set of software artifacts.

The core thesis, stated plainly in the repository, is that Code = SOP(Team). Standard operating procedures, the kinds of documented processes that real software organizations use, can be encoded into the behavior of LLM agents. Instead of a single model trying to be a full-stack developer all at once, you distribute the work across specialized roles with defined inputs and outputs at each stage.

That design choice produces something genuinely different from other coding tools. When you run MetaGPT on a requirement, you don’t just get code. You get a PRD that lists user stories and acceptance criteria. You get a system design document. You get data structures and API specifications. And then, based on those artifacts, you get code. The chain of documents is not decoration. Each artifact constrains the next one, which is exactly how structured software development is supposed to work.

DeepWisdom, the Chinese AI research company that maintains MetaGPT, published the foundational ideas in a paper in 2023 and has continued producing research around multi-agent coordination. The AFlow paper, which describes automated workflow optimization for agent pipelines, was accepted for oral presentation at ICLR 2025, where it ranked second in the LLM-based Agent category. The SPO and AOT papers followed in early 2025. This is a team that thinks seriously about the problems it’s working on, and the academic output is a meaningful part of what makes MetaGPT worth understanding even if you never deploy it in production.

The latest stable release of the open-source framework is v0.8.1, from April 2024. In early 2025, DeepWisdom launched MGX (MetaGPT X), a natural-language programming product built on these ideas, which has since evolved into the Atoms platform. The commercial product is where active feature development is happening. The open-source framework is maintained but no longer the company’s primary focus.

The features that defined the multi-agent moment

Role-based agents and SOPs

The role assignment is the thing MetaGPT is most known for, and it’s worth understanding how it actually works. Each role is not just a label. It carries a defined set of responsibilities, a set of actions it can take, and a set of outputs it produces. The Product Manager agent generates user stories and requirement specifications. The Architect produces system design and technology stack recommendations. The Project Manager breaks work into tasks. The Engineer writes code. The QA Engineer writes tests.

Each role communicates with the others through a message queue, passing structured documents rather than free-form chat. This is the SOP in action. The Product Manager doesn’t hand a vague requirement to the Architect; it hands a formatted PRD with defined fields. The Architect doesn’t hand a prose description to the Engineer; it hands an API spec.

This structure makes the system more predictable than a single-agent loop that can go off in any direction. It also makes it more constrained. The pipeline doesn’t adapt well to tasks that don’t fit the software-company template. If you’re trying to build something that doesn’t require a PRD and a system design, the role structure can feel like overhead rather than help.

Structured deliverables, not just code

This is the feature that actually distinguishes MetaGPT from nearly everything else in the coding agent space. Most AI coding tools produce code. MetaGPT produces a project.

A typical run on a non-trivial requirement generates a folder of artifacts: a markdown PRD with user stories, a system design document with class diagrams and sequence diagrams rendered in Mermaid, an API definition, and a set of source files with tests. The documents are formatted, cross-referenced, and structured in a way that reflects real software development practice.

For anyone who needs to generate project documentation alongside a prototype, this is genuinely valuable. Consulting teams, hackathon participants, and developers building MVPs that need to be presented to stakeholders have real uses for this output. You get a specification and an implementation in one run, which is more than any single-agent tool produces.

The quality of the code is more variable than the quality of the documentation. The PRD and design doc structures are consistent and readable. The code generation quality depends heavily on the model you’re using and the complexity of the requirement, and it frequently needs review before it’s usable.

Multi-model support

MetaGPT is not locked to OpenAI. Configuration happens through a YAML file at ~/.metagpt/config2.yaml, where you specify your model provider, API key, and any model-specific settings. The framework supports GPT-4o, Claude models through the Anthropic API, DeepSeek, Google Gemini, and local models through Ollama.

This matters for cost control. Running a full MetaGPT pipeline on a complex requirement is token-intensive. Each role produces structured output that the next role reads, and those structured documents consume context. Using a cheaper model like DeepSeek for the full pipeline can reduce costs substantially compared to running everything on GPT-4o, though output quality will vary by model.

In practice, models that follow structured output formats reliably tend to produce better results in MetaGPT. The framework expects each role to produce outputs in defined formats, and models that deviate from those formats cause downstream agents to receive malformed inputs. Claude and GPT-4o handle this better than some alternatives.

Sandboxed code execution

MetaGPT includes a code execution environment that lets agents run the code they generate and verify the results. This is used most prominently by the Data Interpreter agent, which can take a data analysis task, write Python code, execute it in a sandbox, observe the output, and iterate on the code if the results aren’t right.

The sandbox addresses one of the obvious failure modes of pure code generation: code that looks correct but fails when run. By giving the agent a feedback loop through execution, MetaGPT’s data analysis workflows are more reliable than static generation. The Data Interpreter can work through a multi-step analysis task, generating visualizations and statistical outputs, in a way that’s genuinely useful for structured data work.

For the main software-company pipeline, code execution plays a more limited role. The generated application code runs in the sandbox for basic validation, but the execution environment is not designed to replicate production conditions, so successful sandbox execution doesn’t guarantee a working application.

Recent additions and active development

While the open-source framework’s release cadence slowed through 2025, the research output from DeepWisdom continued. The AFlow work introduced automated optimization of agent workflow structures, essentially a system for discovering better multi-agent pipelines rather than hand-coding them. The SPO paper addressed prompt optimization across agent systems. These contributions are meaningful for anyone building in the multi-agent space.

The commercial Atoms platform, which won Product of the Week on ProductHunt in March 2025, represents where the team is putting its product energy. Atoms positions itself less as a developer tool and more as a way to turn business ideas into working products with minimal technical friction. It’s a commercial evolution of the MetaGPT thesis: structured multi-agent collaboration, productized for non-developers.

Pricing

MetaGPT the open-source framework is free. The MIT license covers commercial use, modification, and distribution with no restrictions. You install it with pip, configure your model in a YAML file, and pay only for the API calls you make to your chosen provider.

Those API costs are worth thinking about carefully. MetaGPT’s multi-agent pipeline is more token-intensive than single-agent tools because each role produces a full structured document that subsequent roles consume as context. Running a full pipeline on a moderately complex requirement with GPT-4o can cost between $0.50 and $3.00 per run depending on the size of the generated artifacts. Using DeepSeek or a local Ollama model cuts that dramatically, sometimes to near zero for local inference.

The commercial Atoms platform (atoms.dev) has its own pricing structure that is separate from the open-source framework. Specific tier details were not publicly listed at the time of writing, but the platform is a hosted product with sign-up required. If you want the hosted experience without managing infrastructure and API keys yourself, that’s the path DeepWisdom is steering toward.

For the open-source path, the cost model is the same as any bring-your-own-key tool: set up API access for the model you want, run the framework, and the charges appear on your model provider’s bill. There are no surprise costs, no seat licenses, and no rate limits beyond what your chosen model provider imposes.

Where MetaGPT wins and where it doesn’t

MetaGPT’s strongest use case is generating structured project documentation alongside a working prototype. No other open-source tool does this as well. If you need a PRD, a system design, and a first-draft implementation for a new feature or small project, MetaGPT produces all three in one run. The documents are formatted, internally consistent, and genuinely useful as a starting point for a real specification.

The Data Interpreter is a second genuine strength. For structured data analysis tasks where you have a clear analytical goal, the combination of code generation and sandboxed execution produces results that are more reliable than asking a general-purpose model to write analysis code. The feedback loop through execution catches obvious errors that pure generation misses.

Where MetaGPT falls short is everywhere the software-company framing doesn’t fit. Iterative development, where you’re building on an existing codebase and need to make targeted changes, is not what the system was designed for. The pipeline works best on greenfield tasks with a clear one-sentence requirement. On real codebases with existing patterns, dependencies, and constraints, the output quality drops and the overhead of the multi-role pipeline doesn’t pay off.

The setup friction is also a real limitation in 2026, when many competing tools are much easier to start using. The Python version constraint (3.9 to 3.11), the Node.js and pnpm dependencies, and the YAML configuration file are all manageable, but they add up to a first-run experience that’s rougher than tools designed with onboarding as a priority.

Who MetaGPT is built for

Researchers and engineers who want to understand multi-agent architectures will get genuine value from MetaGPT. The codebase is well-organized, the role and SOP abstractions are clearly implemented, and the academic papers give real theoretical grounding. If you’re building your own agent system and want to study a mature reference implementation, MetaGPT is worth running and reading.

Product teams doing early-stage prototyping, where the deliverable is a specification and a demo rather than production code, fit the tool’s output profile well. A product manager who wants to generate a PRD alongside a working prototype for a stakeholder presentation is in MetaGPT’s wheelhouse.

Data scientists and analysts who want an agent-driven analysis workflow can use the Data Interpreter productively without needing to engage with the full software-company pipeline. It’s a capable standalone module.

What MetaGPT is not well-suited for: engineers who need to make targeted changes to existing codebases, teams that need production-quality output without heavy post-processing, or anyone who wants a fast-feedback iterative development loop. For those cases, the best AI agent for coding options include faster, more focused tools.

MetaGPT vs the alternatives

MetaGPT vs AutoGPT

AutoGPT is a general-purpose autonomous agent designed to complete open-ended tasks using tools like web search, file management, and code execution. MetaGPT is a specialized framework for software development workflows with fixed roles and structured outputs.

AutoGPT is more flexible: give it a goal and it figures out how to accomplish it using whatever tools it has. MetaGPT is more structured: it applies a specific multi-role pipeline to a software development task. For tasks that fit the software-company template, MetaGPT’s structured approach produces more predictable outputs. For everything else, AutoGPT’s general-purpose architecture handles a much broader range of work. Neither is a reliable autonomous developer in the unsupervised sense, but they fail in different ways: AutoGPT can go off in unexpected directions on open-ended tasks, while MetaGPT produces well-structured but sometimes incorrect outputs within its pipeline.

MetaGPT vs OpenHands

OpenHands (formerly OpenDevin) is a more direct competitor for software engineering tasks. It’s an open-source autonomous coding agent that can browse the web, run a terminal, edit files, and work through multi-step engineering tasks with a real development environment.

For iterative coding on existing projects, OpenHands is the better tool. It’s designed for the kind of task where you want an agent to take a GitHub issue, reproduce it, write a fix, and run the tests. MetaGPT is better for the earlier phase: generating the initial specification and greenfield structure. The two tools are complementary more than competitive for teams that do both kinds of work.

MetaGPT vs GPT Engineer

GPT Engineer (now evolved into Lovable for the hosted product) has the most similar surface area to MetaGPT. Both take a natural-language description and produce a full codebase. GPT Engineer’s approach is simpler: it engages in a clarification dialogue and then generates code, without the structured role-based pipeline MetaGPT uses.

The practical difference is in output structure. GPT Engineer produces code faster with less ceremony. MetaGPT produces code plus a documented artifact trail. If the documents matter to you, MetaGPT’s overhead is worth it. If you want a code prototype as fast as possible, GPT Engineer’s simpler approach usually gets there quicker with less setup.

Getting started

Install MetaGPT with pip: pip install --upgrade metagpt. You’ll need Python 3.9 or later (but below 3.12), and Node.js with pnpm for the full feature set.

Configure your model in ~/.metagpt/config2.yaml. For OpenAI, set your API key and model name. For Claude, set the Anthropic API key and model. The documentation at docs.deepwisdom.ai covers configuration options for each supported provider.

Run your first task from the command line:

metagpt "Write a CLI tool that converts CSV files to JSON"

MetaGPT will create a workspace directory, run the agent pipeline, and deposit the generated artifacts there. Read through the PRD and system design before looking at the code. The documents give you a better sense of what the system understood about your requirement than the code alone does.

For data analysis tasks, the Data Interpreter is accessible separately and worth experimenting with on a dataset you know well. It lets you evaluate the quality of the generated analysis against ground truth before trusting it on new data.

The bottom line

MetaGPT earned its reputation. The 2023 demo was not hype for the sake of it; the system genuinely does something novel by producing structured project artifacts through a multi-role agent pipeline rather than just generating code. The research behind it is solid, the academic output has been recognized at top venues, and the framework remains one of the clearest reference implementations of how role-based agent systems can be structured.

The honest 2026 read is that it works best as an educational reference and a documentation-generation tool, not as a daily coding driver. For teams that want an autonomous coding agent to work on real tasks, OpenHands or GPT Engineer are more practical starting points. For engineers who want to understand how multi-agent systems are built and want a production-backed codebase to study, MetaGPT is still the best place to start. The stars on the repository are not an accident, and neither are the papers. It’s worth knowing what it does and where it fits.

Key features

  • Assigns distinct roles to agents: Product Manager, Architect, Project Manager, Engineer, QA
  • Generates structured deliverables including PRDs, design docs, API specs, and test suites
  • Runs on any OpenAI-compatible model: GPT-4o, Claude, DeepSeek, Ollama, and more
  • Sandboxed code execution environment for running and verifying generated code
  • Data Interpreter agent for structured data analysis and visualization tasks
  • Standard Operating Procedures (SOPs) encode best practices into the agent pipeline

Pros and cons

Pros

  • + Genuinely novel architecture that encodes software-company SOPs into agent behavior
  • + Produces structured deliverables (PRDs, design docs, tests) not just raw code
  • + Free and open source under the MIT license with 67k+ GitHub stars
  • + Broad model support including OpenAI, Claude, DeepSeek, and local models via Ollama
  • + Active research output with papers at ICLR and NeurIPS explaining the methodology
  • + Data Interpreter agent adds a capable data-science workflow on top of the core framework

Cons

  • − High setup friction compared to modern coding agents, Python 3.9-3.11 required with Node.js and pnpm
  • − Generated code quality is inconsistent and rarely production-ready without significant review
  • − The software-company simulation is better for demos than for iterative daily development
  • − Slower and more token-expensive than direct single-agent coding tools for most tasks
  • − Last stable release (v0.8.1) was April 2024, development momentum has shifted to the commercial Atoms product

Who is MetaGPT for?

  • Rapidly prototyping a full software specification from a single requirement statement
  • Generating structured project documentation (PRDs, API specs, architecture diagrams) alongside code
  • Research and experimentation with multi-agent coordination patterns and SOPs
  • Data analysis workflows using the built-in Data Interpreter agent on structured datasets

Alternatives to MetaGPT

If MetaGPT isn't quite the right fit, the closest alternatives are autogpt , openhands , and gpt-engineer . See our full MetaGPT alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is MetaGPT?
MetaGPT is an open-source multi-agent framework built by DeepWisdom that assigns LLM-powered agents to specific software-company roles: Product Manager, Architect, Project Manager, Engineer, and QA. When you give it a requirement, the agents collaborate through a defined standard operating procedure to produce a full suite of outputs including a PRD, system design, code, and tests. It gained widespread attention in mid-2023 after a demo showed it generating a working Snake game from a single sentence. The framework is free and open source on GitHub with over 67,000 stars.
Is MetaGPT free?
The MetaGPT framework itself is free and open source under the MIT license. You pay only for the API calls to whatever model you configure, billed directly by your model provider. DeepWisdom's commercial product, Atoms (formerly MGX), builds on MetaGPT's ideas and has its own pricing, but using the open-source framework directly costs nothing beyond API usage.
How does MetaGPT compare to AutoGPT?
MetaGPT and AutoGPT are both multi-agent frameworks, but they solve different problems. AutoGPT is a general-purpose autonomous agent that can browse the web, run code, and complete open-ended tasks using tools. MetaGPT is specifically designed to simulate a software company, with fixed roles and structured deliverables baked into its architecture. If you want to automate arbitrary tasks, AutoGPT is more flexible. If you want a pipeline that produces PRDs and design documents alongside code, MetaGPT's structured approach is more appropriate.
Can MetaGPT actually build a complete app?
It depends on what you mean by complete. MetaGPT can generate a working prototype for simple, well-defined tasks like a snake game, a basic CRUD API, or a data analysis script. For anything beyond that, the generated code typically needs significant review, debugging, and restructuring before it's production-ready. The real output MetaGPT is better at is the structured documentation around the code: the PRD, the architecture spec, the API design. Treat it as a project scaffolding tool that happens to also write first-draft code rather than a fully autonomous developer.
What models does MetaGPT support?
MetaGPT supports any OpenAI-compatible API, which means you can configure it to use GPT-4o, Claude models via the Anthropic API, DeepSeek, Google Gemini, and local models through Ollama. You set your model in a YAML configuration file at ~/.metagpt/config2.yaml. The framework was originally built around the OpenAI API but has added broader provider support over time. Some advanced features may behave differently across models depending on how well they follow structured output formats.
Should I use MetaGPT in 2026?
For most day-to-day coding tasks, probably not as your primary tool. MetaGPT is better understood as a research framework and an influential demonstration of multi-agent coordination than as a daily driver. The setup is more involved than modern coding agents, the output quality is inconsistent, and the development activity on the open-source repo has slowed as DeepWisdom's focus shifted to the commercial Atoms platform. It is genuinely worth using if you want to understand how role-based agent systems work, generate structured project documentation, or run the Data Interpreter for data analysis workflows. For pure coding tasks, tools like OpenHands or GPT Engineer offer a more direct path.

Related agents