GitHub - ggwhite/4x: Agentic AI development loop that splits Design, Code, Review, and Test into isolated roles with deterministic guardrails

4x is an open-source CLI that orchestrates AI coding agents into a multi-role development loop — each role (Design, Code, Review, Test) runs in isolation with deterministic guardrails, so features survive contact with production. Like 4X strategy games (eXplore, eXpand, eXploit, eXterminate), the name reflects a system where distinct roles with distinct strengths converge to conquer complexity.

Key Features

Category	Highlights
Multi-Role Loop	Design → Code → Review → Test → Deep Review → Accept, with role isolation. Adaptive pipeline selects profile (full / mini / quick) by feature complexity.
6 AI Runners	Claude Code · Codex · Gemini CLI · Antigravity · Copilot · Cursor — same `.4x/` file protocol, mix and match per role.
Dashboard (4x Live)	macOS native (Swift) + Windows / Linux (Tauri). Real-time SSE monitoring, dependency graph, runner log streaming, screenshot gallery, settings UI, batch monitoring. 6-language i18n, system notifications, menu bar integration.
Deterministic Guardrails	State machine, scope lock, baseline snapshots, evidence-based testing gate, dependency gate — enforced by the Go CLI, not by prompting an LLM.
Crash Recovery	Runner crash → auto-resume from last saved state. Transient API errors (network, rate limits) → automatic backoff retry.
Batch Mode	Dependency-aware DAG scheduling, auto-merge on completion, batch reports, graceful stop. Queue dozens of features and review in the morning.
MCP Server	Model Context Protocol server for integration with MCP-compatible clients.
20+ CLI Commands	`run`, `batch`, `live`, `doctor`, `clean`, `verify`, `mcp`, phase hooks, health checks, structured logging, and more.
Self-Evolution	History mining from past runs, auto-discovered feature enrichment, evolution value gate with anti-hack, self-modification scope guard, and continuous improvement driver (`4x evolve`). 4x learns from its own failures and iterates itself.

Why 4x?

Single-agent coding is fast but fragile. You ask one AI to design, implement, review, and test — all in the same breath, with the same biases. It works for small tasks. It falls apart on real features.

4x splits the loop. Each role has a focused job, limited scope, and no access to the others' reasoning. The Designer doesn't write code. The Coder doesn't judge its own work. The Reviewer is adversarial by design. The Tester validates against criteria written before implementation.

The result: features that survive contact with production.

Trade-offs

Choosing 4x means trading speed and cost for structure and correctness. Be honest about whether your project needs that trade.

Strengths

Role isolation eliminates self-review bias. The Coder never judges its own work. The Reviewer is adversarial by design. Single-agent workflows let the same model write and approve code — 4x doesn't.
Deterministic guardrails don't depend on AI judgment. Scope lock, state machine, evidence requirements — these are enforced by the CLI in Go, not by prompting an LLM to "please stay in scope."
File-based protocol makes it LLM-agnostic. Switch between Claude, Gemini, Codex, or mix them per role. No vendor lock-in, no SDK dependency.
Crash-resistant state. Everything lives in .4x/ files. Session dies, machine reboots — 4x run picks up exactly where it stopped.
Human stays in the loop. The pending-review gate ensures a human always reviews AI work before it's marked done. The AI proposes, you dispose.
Tames large-scale refactoring. Changes too big for a single AI session — splitting god objects, extracting packages, migrating APIs — can be broken into dependent features with appropriate profiles. 4x handles the sequencing, review, and verification across multiple phases that would overwhelm a single context window.
Batch mode scales. Dependency-aware scheduling lets you queue dozens of features overnight and review them in the morning.

Weaknesses

Significantly higher token cost. Every feature runs through 4+ separate LLM calls at minimum. A review failure doubles that. Expect 3-10x the token cost of a single-agent approach for the same task. See Usage Tips for cost estimates.
Slower for simple tasks. A one-line bug fix doesn't need a Designer, Reviewer, and Tester. The overhead of the full loop is wasted on trivial changes. Use single-agent tools for quick fixes.
Setup cost. 4x init, feature YAML, settings configuration — there's ceremony before you start. Not worth it for a throwaway script.
Rigid loop structure. The Design → Code → Review → Test sequence is fixed. If your workflow doesn't fit four roles, you'll fight the framework instead of using it.
Quality depends on prompt quality. Vague feature descriptions produce vague specs, which produce wrong code. 4x adds structure, but garbage in still means garbage out — just with more steps.

When to use 4x

Features that need to be correct (payments, auth, data pipelines)
Work that benefits from adversarial review (security-sensitive code)
Batch processing of a feature backlog
Teams that want audit trails of AI-generated code

When NOT to use 4x

Quick one-off fixes or exploratory prototyping
Tasks where speed matters more than correctness
Projects where token budget is tight
Solo hacking sessions where you'd review the code yourself anyway

Architecture

 You
  |
  v
+--------------------------------------------------+
|  4x CLI (Go)                                     |
|  Deterministic guardrails. No LLM calls.         |
|  Scope checks, protocol, state machine, batch    |
+--------+-----------------------------------------+
         |  .4x/ directory (file-based protocol)
         v
+--------------------------------------------------+
|  Runners                                         |
|  Claude Code | Codex | Gemini | Antigravity      |
|  Copilot | Cursor                                |
|  Each uses native platform capabilities          |
+--------+-----------------------------------------+
         |  SSE events
         v
+--------------------------------------------------+
|  4x Live (Dashboard)                             |
|  Multi-project real-time monitoring              |
+--------------------------------------------------+

Layer 1 — CLI handles everything deterministic: scope validation, state transitions, baseline snapshots, evidence collection. It never calls an LLM. Guardrails don't depend on AI judgment.

Layer 2 — Runners bridge the CLI protocol to your AI tool of choice. Claude Code, Codex, Gemini, Antigravity, Copilot, Cursor — each speaks the same .4x/ file protocol but uses native platform capabilities.

Layer 3 — Live is the multi-project dashboard. Watch your AI agents work in real-time, see phase transitions, stream logs. REST + SSE API.

Installation

Homebrew (macOS / Linux)

brew install ggwhite/tap/fourx

Go Install

go install github.com/ggwhite/4x/cmd/4x@latest

Shell Script

curl -sSfL https://raw.githubusercontent.com/ggwhite/4x/main/install.sh | sh

Download Binary

Pre-built binaries for macOS, Linux, and Windows (amd64 / arm64) are available on the Releases page.

Quick Start

# Initialize in your project
cd my-project
4x init

# Create a feature
4x new "User authentication with OAuth2"
# => Created: F001-user-authentication-w

# Run the full loop
4x run F001 --runner claude

# Check status
4x status

# Review and complete
4x done F001

# Or watch it live
4x live -w

4x run drives the Design-Code-Review-Test loop automatically. If Review finds issues, Code gets another pass. If Test fails, the loop iterates. You stay in control with --max-rounds and --timeout flags.

The Four Roles

Role	Job	Outputs
Designer	Analyze requirements, produce spec + acceptance criteria	`task-brief.md`, `acceptance-criteria.md`
Coder	Implement exactly what the spec says	Source code, `coder-report.md`
Reviewer	Catch bugs and spec violations (checklist + adversarial)	`review-report.md` with verdict
Tester	Validate against acceptance criteria with evidence	`test-report.md`, `verify.json`

Each role is isolated. The Coder never sees the Reviewer's prior feedback. The Tester validates against criteria written by the Designer, not the Coder. This separation prevents the blind spots that plague single-agent workflows.

How the Loop Works

Designer → Coder → Reviewer → Tester → Accept → Pending Review → Done
                      ↓           ↓                                 ↑
                   amending ←─────┘                          human sign-off

Review failure (verdict FAIL or CRITICAL findings) sends code back for amending
Test failure (verify not passed) sends code back for amending
Escalation (spec mismatch, criteria wrong) routes back to Designer
Pending review gate ensures a human always reviews before marking done
Round budget (default 5) prevents infinite loops

Deterministic Guardrails

Enforced by the CLI, not AI judgment:

Guardrail	What it does
Scope check	Changed files must be within declared repos
Baseline snapshot	Pre-coding state captured for safe rollback
State machine	Phases must proceed in legal order
Evidence requirement	Tester must provide verify.json with command output
Testing gate	verify.json + test-report + final-report required
Dependency gate	Features with unmet dependencies cannot start

Batch Mode

4x batch plan            # generate dependency-aware execution plan
4x batch run --runner claude  # run all eligible features in order
4x batch stop            # graceful shutdown after current feature

MCP Server

Start the Model Context Protocol (MCP) server:

4x mcp

Permission Model

4x runs AI agents in non-interactive mode. During 4x init, runners are configured with flags that skip permission prompts (--dangerously-skip-permissions, -y, approval: full-auto) so the loop runs autonomously.

The CLI's deterministic guardrails (scope lock, baseline snapshots, state machine) provide the safety boundary.

Run 4x only in projects where you are comfortable with autonomous AI agent execution.

Documentation

Document	Description
User Guide	Complete usage documentation
Getting Started	Installation and first run
CLI Reference	All commands and flags
Core Concepts	Roles, state machine, protocol, guardrails
Configuration	Settings, models, locale, runners
Runners & Plugins	Supported runners and plugin contract
Dashboard	4x Live multi-project dashboard
Batch Mode	Dependency-aware batch execution

Project Structure

4x/
  cmd/4x/              CLI entry point (Cobra)
  internal/
    protocol/           .4x/ file format, workspace, types
    state/              State machine (phase transitions)
    guard/              Guardrail checks (scope, baseline, evidence)
    batch/              Dependency DAG, batch scheduler
    runner/             Subprocess runner interface
    server/             SSE + REST server for Live dashboard
  plugins/
    claude-code/        Claude Code skill + workflow
    codex/              Codex runner instructions
    gemini/             Gemini runner instructions
    agy/                Antigravity runner instructions
    copilot/            Copilot runner instructions + workflow
    cursor/             Cursor rules
    embed.go            go:embed plugin files into binary
  dashboard/
    macos/              Swift native app (planned)
  docs/
    guide/              User documentation
    architecture/       System-level design docs
    design/             Mechanism design docs
    reference/          Plugin contract

FAQ

Q: Does 4x call any LLM APIs directly? No. The CLI is pure Go with zero LLM dependencies. Runners handle all AI interaction using their native platform capabilities.

Q: Can I use different LLMs for different roles? Yes. Configure per-role models in .4x/settings.json. Use Claude for Design, Gemini for Code — each reads the same .4x/ files.

Q: How is this different from Devin / SWE-agent / OpenHands? Those are autonomous agents that do everything in one shot. 4x is a framework that structures multi-role collaboration with deterministic guardrails. It's closer to a CI pipeline for AI than a single autonomous agent.

Origin Story

4x was born inside a production system called DCT (Designer-Coder-Tester) that shipped 60+ features for a large-scale platform rewrite. The patterns that survived — role isolation, file-based protocol, deterministic scope checking, evidence-based testing — became 4x. The parts that didn't survive — LLM-specific hacks, shared context assumptions, trust-based guardrails — were deliberately left out.

Contributing

git clone https://github.com/ggwhite/4x.git
cd 4x
go build ./cmd/4x
go test ./...

License

MIT

Stop hoping your AI writes correct code. Start verifying it.

Name		Name	Last commit message	Last commit date
Latest commit History 884 Commits
.4x		.4x
.claude		.claude
.github		.github
assets/icons		assets/icons
cmd/4x		cmd/4x
dashboard		dashboard
docs		docs
examples		examples
internal		internal
plugins		plugins
schemas		schemas
scripts		scripts
templates		templates
.cursorrules		.cursorrules
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
AGENTS.md		AGENTS.md
AGY.md		AGY.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
GEMINI.md		GEMINI.md
LICENSE		LICENSE
Makefile		Makefile
NOTIFICATION.md		NOTIFICATION.md
README.md		README.md
THREAT_MODEL.md		THREAT_MODEL.md
codex.json		codex.json
go.mod		go.mod
go.sum		go.sum
init.sh		init.sh
install.sh		install.sh
progress.md		progress.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Key Features

Why 4x?

Trade-offs

Strengths

Weaknesses

When to use 4x

When NOT to use 4x

Architecture

Installation

Homebrew (macOS / Linux)

Go Install

Shell Script

Download Binary

Quick Start

The Four Roles

How the Loop Works

Deterministic Guardrails

Batch Mode

MCP Server

Permission Model

Documentation

Project Structure

FAQ

Origin Story

Contributing

License

About

Uh oh!

Releases 33

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Key Features

Why 4x?

Trade-offs

Strengths

Weaknesses

When to use 4x

When NOT to use 4x

Architecture

Installation

Homebrew (macOS / Linux)

Go Install

Shell Script

Download Binary

Quick Start

The Four Roles

How the Loop Works

Deterministic Guardrails

Batch Mode

MCP Server

Permission Model

Documentation

Project Structure

FAQ

Origin Story

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 33

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages