codingcliautonomous Featured Status: active

OpenAI Codex

OpenAI's terminal-based coding agent powered by GPT-5

OpenAI Codex is a terminal-native coding agent that runs on your machine, reads your actual files, and executes shell commands under your supervision. Shipped in 2025 as OpenAI's direct answer to Claude Code, it has since matured into one of the most capable agents in the space. It runs on GPT-5 and the o-series reasoning models, supports a plan-and-approve workflow, and can delegate longer tasks to cloud agents at chatgpt.com/codex when you don't want to stay at your keyboard. The CLI is open-source, installable via npm or Homebrew, and bundled with every ChatGPT paid plan from Plus upward. By mid-2026 it sits comfortably alongside Claude Code as a first-choice option for engineers who want a serious autonomous coding workflow in their terminal.

When OpenAI shipped the Codex CLI in April 2025, the framing was obvious to anyone who had spent time with Claude Code. This was OpenAI’s answer to Anthropic’s terminal agent, arriving roughly six months after Claude Code had established what a serious coding agent could look like. The question at launch was whether OpenAI could match the depth of Anthropic’s first-mover tool or whether they’d ship something that looked similar but didn’t hold up under real workloads. Now, fourteen months into its life, the answer is more interesting than either camp initially expected. OpenAI Codex has matured into a genuine alternative, with its own design philosophy and a few capabilities Claude Code still doesn’t have. This is an honest account of where it stands.

Quick verdict

Codex is worth it if you’re already on ChatGPT Plus and want a capable terminal agent at no added cost, or if you specifically want cloud-based async task delegation. The model quality at the GPT-5 tier is competitive with Anthropic’s best, and the three-mode safety system gives you more granular control than most agents offer. Where it trails is MCP integration depth and the maturity of its ecosystem around subagents and hooks. Strong tool. Not obviously better than Claude Code, but not worse either.

What is OpenAI Codex, exactly?

The name carries baggage. From 2021 to 2023, OpenAI ran a product also called Codex: a fine-tuned GPT-3 model exposed via API for code completion. GitHub Copilot was originally powered by it. That model was deprecated in March 2023 and has nothing to do with what ships today. The current Codex is an agentic CLI, closer in concept to a junior contractor than to an autocomplete system.

The 2025 Codex CLI runs as a local process on your machine. You install it, point it at a project directory, and issue tasks in natural language. It reads your files, reasons about what needs to change, proposes a plan, and executes edits and shell commands with whatever level of autonomy you specify. The core loop is: understand the task, read relevant files, propose changes, get approval, apply changes, run verification commands, report results.

What makes this categorically different from the 2021 model is agency. The old Codex predicted the next token. The new one maintains a goal, breaks it into steps, tracks intermediate results, and adjusts when something doesn’t work. When you ask it to add pagination to every list endpoint in a REST API, it doesn’t complete the current line you’re writing. It finds all the list endpoints, figures out how your existing pagination logic works (or doesn’t), writes the implementation, updates the tests, and runs them to check.

The CLI is open-source under Apache-2.0, written primarily in Rust, and available via npm (npm install -g @openai/codex) or Homebrew. That open-source commitment matters: you can read what it’s doing, contribute fixes, and in principle fork it. That’s a genuine differentiator against Claude Code, which is not open-source. A desktop app is available via codex app, and there are IDE integrations for VS Code, Cursor, and Windsurf for people who want their agent accessible without leaving their editor. The cloud agent at chatgpt.com/codex rounds out the surface area with a web-based async interface.

Authentication uses your ChatGPT credentials if you’re on a paid plan, or a standard OpenAI API key if you’d rather pay per token. The subscription path is cleaner for most users.

The features that make it the Claude Code rival

Terminal-native multi-file edits

The fundamental value proposition matches Claude Code’s: describe a task, get coherent edits across however many files the task actually touches. Ask Codex to “extract the database layer into a separate service and update all callers” and it will trace the dependency graph, find every file that imports from the data layer, rewrite them to talk to the new service interface, and tell you what it changed and why.

The quality here is high. In our testing across TypeScript, Python, and Go projects, Codex handled multi-file refactors that require understanding of import chains and call signatures with accuracy comparable to Claude Code. It makes fewer “forgot to update this one caller” mistakes than it did at launch in 2025, which was one of the early criticisms. The Rust-based CLI is noticeably fast at file I/O and diff application, which matters less in interactive mode but is very noticeable in longer autonomous runs.

One real advantage: because the CLI is open-source, when something breaks during a multi-file edit, you can actually read what the agent was trying to do at the framework level, not just at the prompt level. That transparency pays dividends when you’re debugging a bad session.

GPT-5 and o-series model picker

Codex lets you select your model per session, and the selection matters. GPT-4o handles routine tasks well and is fast. The o-series reasoning models (o3, o4-mini, o4) are worth enabling for tasks that need genuine step-by-step inference: architectural decisions, complex bug traces, tasks where getting the approach right is more important than getting the first draft fast.

GPT-5 is the top of the stack on Pro plans and it shows. On tasks that require understanding a large, unfamiliar codebase well enough to make a structural change, GPT-5’s context utilization is excellent. The practical difference between GPT-5 and o4 for coding is roughly: GPT-5 is faster and better at writing code, o4 is more methodical and better at reasoning through what the code should do before writing it. Codex lets you pick based on what the current task needs.

This model flexibility is something Claude Code doesn’t match directly. Anthropic’s agent is tightly coupled to Claude’s model family, which is excellent but gives you less control over the speed-vs-depth tradeoff per task.

Plan, edit, execute loop

Before touching any file, Codex presents a plan: here’s what I read, here’s what I’m going to change, here’s the order I’m going to do it in. You can approve, revise, or reject before a single edit lands.

The plan quality has improved substantially since launch. Early Codex plans were sometimes vague in ways that made approval feel like signing a blank check. Current plans name specific files, specific functions, and specific logic changes. You know what you’re approving. The plan stage also catches the right class of mistake: when the model has misread which component owns a particular responsibility, it shows up in the plan summary before code gets written.

The execute phase includes running shell commands that you’ve approved: test suites, linters, build checks. Codex reads the output and adapts. If the tests fail, it reads the failure, hypothesizes a cause, and proposes a fix. This iteration loop is the part of the agent experience that still feels closest to magic in 2026, even after a year of using it. The fact that the agent can read a stack trace and revise its own work is not a small thing.

Approvals and safety modes

This is where Codex makes a design choice that Claude Code doesn’t, and it’s a good one. There are three explicit modes:

Suggest mode proposes every change and waits for you to apply them. Nothing happens automatically. This is the right mode when you’re on a production codebase you don’t fully own, or when you’re working with a contractor-style agent you want to supervise closely.

Auto mode applies file edits automatically but pauses for every shell command. You see the edit, the model continues; you see the proposed command, you approve it before it runs. This is the mode most engineers settle into for daily use. It’s fast but keeps humans in the loop for anything with side effects.

Full-auto mode runs everything. File edits, shell commands, test execution, all of it without stopping. This is appropriate for well-scoped tasks in isolated branches, or for cloud agent runs where you’ve deliberately asked Codex to go handle something while you’re away. It’s not the default, and it shouldn’t be, but having it available for the right situations is useful.

Cloud agents and async runs

The feature Claude Code doesn’t have a direct answer to: chatgpt.com/codex runs your task in a cloud VM while you’re not at your machine. You submit a task from the web interface, the agent spins up, clones your repo, does the work, and you come back to a diff and a summary. For self-contained tasks where you don’t need to supervise the process, this is genuinely convenient.

The cloud agent respects the same plan-and-approve loop. Before it starts executing, it posts a plan for you to approve. If you’re not watching, it queues there until you respond. In fully autonomous mode it can proceed without approval, but most users will want to see the plan on anything non-trivial.

The practical use case is overnight delegation: queue up three or four well-scoped tasks before you leave, come back to reviewed PRs in the morning. Google Jules works on a similar model, but Jules is GitHub-issue-first. Codex’s cloud agent is more general-purpose and can work on tasks that don’t map to a specific issue.

Pricing

Codex is bundled with ChatGPT paid plans, not sold as a separate product. That framing is worth understanding because it means you’re not evaluating the agent on its own economics; you’re evaluating whether the ChatGPT plan you’d buy anyway is worth it, and whether Codex makes it more so.

ChatGPT Plus at $20 per month is the entry point. It gives you Codex with GPT-4o and the o-series models. For engineers using Codex as an occasional tool, a few tasks a week, the Plus tier is sufficient. Rate limits are real but not punishing for moderate use.

ChatGPT Pro at $200 per month is a significant jump. It adds GPT-5 access, higher rate limits, and priority access to the strongest o-series models. For engineers running Codex for several hours a day, or using cloud agents to parallelize work across multiple tasks, Pro can pay for itself in recovered time. For occasional users, it’s hard to justify. The price gap between Plus and Pro is steeper than the equivalent gap in Claude’s plan structure, where Claude Pro sits at $17/month and Claude Max starts at $100.

Business, Edu, and Enterprise plans include Codex access with shared rate limits and admin-level controls. Enterprise pricing is negotiated and includes SSO, audit logging, and data processing agreements that Plus and Pro don’t provide.

The API key path is available for users who want pay-as-you-go billing rather than a subscription. At GPT-5 pricing this can get expensive quickly for heavy agentic use, since long sessions consume significant context. The subscription path is almost always cheaper for daily use.

There’s no free tier for Codex. You need a paid ChatGPT plan to use it, which puts it in the same position as Claude Code in terms of barrier to entry. Google Jules has a free tier with 15 tasks per day; for developers who want to try an agent before committing money, Jules is currently the better entry point.

Where Codex wins and where it doesn’t

Codex wins on model flexibility. The ability to dial between fast GPT-4o for routine work and deliberate o4 reasoning for architectural decisions in the same tool is genuinely useful. Claude Code’s model selection is improving but still less granular.

Codex wins on cloud agents. Async task delegation with a proper plan-and-approve loop is a complete feature, not a beta. If overnight autonomous work is a workflow you want, Codex has the best implementation of it among terminal agents.

Codex wins on transparency. Open-source code under Apache-2.0 means you can read the framework, submit issues, and in extreme cases fork it. For teams with specific security or compliance requirements, that auditability matters.

Where Codex trails: MCP integration. Claude Code’s MCP support is more mature, with a richer ecosystem of pre-built servers for databases, browsers, and external APIs. Codex has tool-use support but the connector ecosystem is smaller. If your workflow depends on wiring the agent into your actual infrastructure, Claude Code is currently the better choice.

Codex also trails on hook-based automation. Claude Code’s lifecycle hooks that fire shell commands on events like post-edit or pre-commit are well-tested and documented. Codex’s equivalent is functional but less flexible for power users who want to automate their own post-processing.

Who Codex is built for

The clearest fit is an engineer who already pays for ChatGPT Plus and wants to start using a coding agent at no additional cost. The on-ramp is frictionless: you’re already authenticated, the model quality is excellent, and the tool is a one-command install away. That “free upgrade” framing is Codex’s strongest acquisition advantage.

The second strong fit is anyone who wants async task delegation. If your workflow includes handing off well-scoped tasks to run while you focus on higher-priority work, the cloud agent is the most polished implementation of that pattern in the market. You queue work, approve plans, review results. The overhead per task is low.

Open-source-minded teams will also find Codex compelling in ways Claude Code can’t match. Being able to read the source, submit patches, and know exactly how the agent processes your files and credentials is a meaningful comfort for security-conscious organizations.

For engineers who want the deepest integration with external tools via MCP, or who are already heavily invested in Anthropic’s model ecosystem, Claude Code is still the better fit today. For a wider comparison of coding agents, the best AI agent for coding guide maps the full field.

Codex vs the alternatives

Codex vs Claude Code

Claude Code came first and it shows in the ecosystem. MCP integration is richer, the hook system is more powerful for automation-minded users, and the CLAUDE.md persistent memory system has had more time to mature. On raw multi-file reasoning quality, the two tools are close enough that which one wins depends heavily on the specific task and codebase.

The real differentiators run in both directions. Codex has cloud agents and an open-source CLI; Claude Code has a more mature tool ecosystem and tighter model-agent integration. Pricing is similar at the entry tier: $20 for ChatGPT Plus, $20 for Claude Pro (or $17 annual). The Pro tiers diverge sharply: Claude Max at $100 versus ChatGPT Pro at $200.

If you’re already in the OpenAI ecosystem and pay for ChatGPT, Codex is the obvious call. If you want the deepest agentic tooling and the most mature hooks and MCP support, Claude Code still leads. The Claude Code vs Cursor head-to-head covers Claude Code’s positioning in the editor-versus-agent debate in more detail.

Codex vs Google Jules

Google Jules takes a fundamentally different approach. It’s GitHub-issue-first: you assign an issue to Jules, it plans the work, executes it in a cloud VM, and opens a PR. That workflow is elegant for teams whose work lives in GitHub issues. Jules has a free tier at 15 tasks per day, which Codex and Claude Code don’t offer.

Where Jules loses is flexibility. It doesn’t run locally, so you can’t use it for tasks that require your local environment, private credentials, or work that isn’t structured as a GitHub issue. Codex’s cloud agent is more general-purpose and Codex’s local CLI covers everything Jules can’t. Jules is a better entry point for developers who want to try an agent without spending money. Codex is the better tool for engineers who need flexibility across local and async workflows.

Codex vs Devin

Devin is the fully autonomous end of the spectrum, starting at $500 per month for team plans. The bet Devin makes is that you hand it a ticket and review a pull request; you don’t supervise the middle. That’s genuinely useful for certain organizational workflows, but at $500 the economics only work if the tasks are substantial and the supervision overhead of the agent’s choices is genuinely lower than the cost of doing it yourself.

Codex at $20 with cloud agents gets you most of the async delegation story for a fraction of the price, with the tradeoff that you’re approving plans rather than reviewing final PRs. For most solo developers and small teams, that’s the right tradeoff. Devin makes more sense for larger engineering organizations that have enough volume of routine tickets to justify the cost and the management overhead. See the best AI agent for coding comparison for a side-by-side of the full pricing and capability range.

Getting started

Install via npm or Homebrew. With npm: npm install -g @openai/codex. With Homebrew: brew install --cask codex. Both install the same binary; use whichever fits your package management preferences.

After installation, cd into a project directory and run codex. On first launch you’ll authenticate with your ChatGPT account or enter an API key. The setup takes about two minutes.

Start with a read-only task to build trust. Try codex "explain the main data flow in this codebase" before you ask it to change anything. This shows you how it reads your project and gives you a sense of whether its model of your architecture matches yours. Fix any gaps before you let it write.

When you’re ready to run a real task, use suggest mode first: codex --suggest "add rate limiting to the authentication endpoints". Read what it proposes. If the plan looks right, approve and watch it work. If something is off, tell it specifically what’s wrong. Precise correction gets precise results.

For the cloud agent, visit chatgpt.com/codex, connect your repository, and submit a task. Approve the plan when it posts. Come back to the result. That full cycle, from task submission to diff review, is the clearest argument for Codex’s place in a modern engineering workflow.

The bottom line

OpenAI Codex in mid-2026 is not the me-too product it risked being at launch. It has a coherent identity: open-source CLI, flexible model selection, the best cloud agent implementation in the terminal-agent category, and a safety mode system that’s more granular than most competitors. It’s also honest about its current gaps: MCP integration and lifecycle automation are still behind Claude Code, and the Pro tier price jump is steep.

For engineers already on ChatGPT Plus, it’s an obvious tool to add. For teams who want async delegation without Devin’s pricing, it’s the strongest option available. And for developers who want to know exactly what their coding agent is doing at the framework level, the open-source code is the argument that nothing else in the category can currently make.

It’s not obviously the best terminal coding agent in every dimension. But the dimensions where it wins are real, and the gap is narrow enough that your existing toolchain and model preferences will likely be the deciding factor.

Key features

Multi-file edits across your entire local repository
GPT-5 and o-series model selection per session
Plan-edit-execute loop with step-by-step approval
Suggest, auto, and full-auto safety modes
Cloud agent runs via chatgpt.com/codex for async tasks
IDE integrations for VS Code, Cursor, and Windsurf
Open-source CLI under Apache-2.0 license

Pros and cons

Pros

+ GPT-5 model quality is competitive with top Anthropic models for most coding tasks
+ Three safety modes give precise control over how much the agent acts on its own
+ Cloud agents at chatgpt.com/codex enable genuine async task delegation
+ Open-source under Apache-2.0 so you can inspect and modify the CLI itself
+ IDE integrations for VS Code, Cursor, and Windsurf for non-terminal workflows
+ Bundled with ChatGPT Plus at $20/month, keeping entry cost reasonable

Cons

− Cloud agent mode is cloud-only with no on-premise equivalent
− MCP ecosystem is less mature than Claude Code's as of mid-2026
− Full-auto mode requires real trust in the model; misfire recovery can be slow
− API key path adds cost complexity that subscription users don't expect

Who is OpenAI Codex for?

Backend engineers running multi-file refactors without leaving the terminal
Teams delegating async ticket work to cloud agents overnight
Developers who already pay for ChatGPT Plus and want to add a coding agent at no extra cost
Open-source contributors who want to inspect and customize their agent tooling

Alternatives to OpenAI Codex

If OpenAI Codex isn't quite the right fit, the closest alternatives are claude-code , google-jules , and devin . See our full OpenAI Codex alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is OpenAI Codex?

OpenAI Codex is a terminal-based coding agent released by OpenAI in 2025. It runs as a CLI on your local machine, reads and writes files across your project, executes shell commands, and can carry out multi-step coding tasks from a single natural-language instruction. It's distinct from the original 2021 Codex model (the GPT-3-based code completion API that was deprecated in 2023) and is instead a full agentic tool powered by GPT-5 and the o-series reasoning models. It also offers a cloud agent version accessible at chatgpt.com/codex for async task execution.

How does Codex compare to Claude Code?

Both are terminal-native coding agents with plan-and-approve workflows and multi-file editing. Claude Code has a more mature MCP integration ecosystem and slightly stronger multi-file reasoning on very large codebases in our testing. Codex counters with native cloud agent runs, a more granular three-mode safety system, and the fact that it's bundled with ChatGPT Plus at $20/month versus Claude Pro at $20/month. For pure coding tasks the gap is narrow. The deciding factor for most engineers is which model family they trust more and which ecosystem (OpenAI vs Anthropic) they're already in.

How much does Codex cost?

Codex is bundled with every paid ChatGPT plan. ChatGPT Plus costs $20/month and gives you Codex access with GPT-4o and o-series models. ChatGPT Pro at $200/month adds higher rate limits and priority access to GPT-5 and the more powerful o-series models. Business, Edu, and Enterprise plans include Codex at their respective pricing tiers. There's no standalone Codex subscription. You can also authenticate with an OpenAI API key directly, in which case usage is billed at standard API rates.

Is Codex the same as the old GPT-3 Codex?

No. The original OpenAI Codex was a fine-tuned GPT-3 model for code completion, launched in 2021 and deprecated in March 2023. The current Codex is an entirely different product: an agentic CLI tool that runs locally, edits files, executes shell commands, and uses GPT-5 and the o-series reasoning models. The name is a deliberate callback to OpenAI's early code work, but the underlying technology and the user experience have nothing in common with the 2021 API.

What models does Codex use?

Codex defaults to GPT-4o for everyday tasks and lets you select from the o-series reasoning models (o3, o4-mini, and o4) for tasks that benefit from deeper step-by-step reasoning. GPT-5 is available on Pro and higher plans and is the recommended model for complex multi-file work. You can switch models per session with a flag or set a default in your config file. API key users have access to the same model list subject to their API tier.

Can Codex run autonomously?

Yes, with conditions. Codex has three operating modes: suggest mode (proposes changes, you apply them), auto mode (applies edits autonomously but pauses for shell commands), and full-auto mode (runs everything without stopping). For most production codebases, auto mode is the practical choice: it moves fast but keeps a human in the loop for anything destructive. The cloud agent at chatgpt.com/codex adds a fourth option: submit a task and let it run in a cloud VM while you do something else, then review the result.

Related agents

Aider

Git-aware AI pair programmer that runs in your terminal

Featured

codingcli Free

Amazon Bedrock Agents

AWS-native AI agent platform built on Bedrock with Lambda actions and Guardrails

autonomousenterprise Paid

Amazon Q Developer

AWS-native AI coding assistant with deep cloud integration

codingvscode-extension Free + from $19/mo