Skip to content

ggwhite/4x

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

884 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English | 繁體中文 | 简体中文 | 日本語 | 한국어 | Español

Go Reference Go Report Card License: MIT CI

4X — Design. Code. Review. Test.

4x demo

4x is an open-source CLI that orchestrates AI coding agents into a multi-role development loop — each role (Design, Code, Review, Test) runs in isolation with deterministic guardrails, so features survive contact with production. Like 4X strategy games (eXplore, eXpand, eXploit, eXterminate), the name reflects a system where distinct roles with distinct strengths converge to conquer complexity.

Key Features

Category Highlights
Multi-Role Loop Design → Code → Review → Test → Deep Review → Accept, with role isolation. Adaptive pipeline selects profile (full / mini / quick) by feature complexity.
6 AI Runners Claude Code · Codex · Gemini CLI · Antigravity · Copilot · Cursor — same .4x/ file protocol, mix and match per role.
Dashboard (4x Live) macOS native (Swift) + Windows / Linux (Tauri). Real-time SSE monitoring, dependency graph, runner log streaming, screenshot gallery, settings UI, batch monitoring. 6-language i18n, system notifications, menu bar integration.
Deterministic Guardrails State machine, scope lock, baseline snapshots, evidence-based testing gate, dependency gate — enforced by the Go CLI, not by prompting an LLM.
Crash Recovery Runner crash → auto-resume from last saved state. Transient API errors (network, rate limits) → automatic backoff retry.
Batch Mode Dependency-aware DAG scheduling, auto-merge on completion, batch reports, graceful stop. Queue dozens of features and review in the morning.
MCP Server Model Context Protocol server for integration with MCP-compatible clients.
20+ CLI Commands run, batch, live, doctor, clean, verify, mcp, phase hooks, health checks, structured logging, and more.
Self-Evolution History mining from past runs, auto-discovered feature enrichment, evolution value gate with anti-hack, self-modification scope guard, and continuous improvement driver (4x evolve). 4x learns from its own failures and iterates itself.

Why 4x?

Single-agent coding is fast but fragile. You ask one AI to design, implement, review, and test — all in the same breath, with the same biases. It works for small tasks. It falls apart on real features.

4x splits the loop. Each role has a focused job, limited scope, and no access to the others' reasoning. The Designer doesn't write code. The Coder doesn't judge its own work. The Reviewer is adversarial by design. The Tester validates against criteria written before implementation.

The result: features that survive contact with production.

Trade-offs

Choosing 4x means trading speed and cost for structure and correctness. Be honest about whether your project needs that trade.

Strengths

  • Role isolation eliminates self-review bias. The Coder never judges its own work. The Reviewer is adversarial by design. Single-agent workflows let the same model write and approve code — 4x doesn't.
  • Deterministic guardrails don't depend on AI judgment. Scope lock, state machine, evidence requirements — these are enforced by the CLI in Go, not by prompting an LLM to "please stay in scope."
  • File-based protocol makes it LLM-agnostic. Switch between Claude, Gemini, Codex, or mix them per role. No vendor lock-in, no SDK dependency.
  • Crash-resistant state. Everything lives in .4x/ files. Session dies, machine reboots — 4x run picks up exactly where it stopped.
  • Human stays in the loop. The pending-review gate ensures a human always reviews AI work before it's marked done. The AI proposes, you dispose.
  • Tames large-scale refactoring. Changes too big for a single AI session — splitting god objects, extracting packages, migrating APIs — can be broken into dependent features with appropriate profiles. 4x handles the sequencing, review, and verification across multiple phases that would overwhelm a single context window.
  • Batch mode scales. Dependency-aware scheduling lets you queue dozens of features overnight and review them in the morning.

Weaknesses

  • Significantly higher token cost. Every feature runs through 4+ separate LLM calls at minimum. A review failure doubles that. Expect 3-10x the token cost of a single-agent approach for the same task. See Usage Tips for cost estimates.
  • Slower for simple tasks. A one-line bug fix doesn't need a Designer, Reviewer, and Tester. The overhead of the full loop is wasted on trivial changes. Use single-agent tools for quick fixes.
  • Setup cost. 4x init, feature YAML, settings configuration — there's ceremony before you start. Not worth it for a throwaway script.
  • Rigid loop structure. The Design → Code → Review → Test sequence is fixed. If your workflow doesn't fit four roles, you'll fight the framework instead of using it.
  • Quality depends on prompt quality. Vague feature descriptions produce vague specs, which produce wrong code. 4x adds structure, but garbage in still means garbage out — just with more steps.

When to use 4x

  • Features that need to be correct (payments, auth, data pipelines)
  • Work that benefits from adversarial review (security-sensitive code)
  • Batch processing of a feature backlog
  • Teams that want audit trails of AI-generated code

When NOT to use 4x

  • Quick one-off fixes or exploratory prototyping
  • Tasks where speed matters more than correctness
  • Projects where token budget is tight
  • Solo hacking sessions where you'd review the code yourself anyway

Architecture

 You
  |
  v
+--------------------------------------------------+
|  4x CLI (Go)                                     |
|  Deterministic guardrails. No LLM calls.         |
|  Scope checks, protocol, state machine, batch    |
+--------+-----------------------------------------+
         |  .4x/ directory (file-based protocol)
         v
+--------------------------------------------------+
|  Runners                                         |
|  Claude Code | Codex | Gemini | Antigravity      |
|  Copilot | Cursor                                |
|  Each uses native platform capabilities          |
+--------+-----------------------------------------+
         |  SSE events
         v
+--------------------------------------------------+
|  4x Live (Dashboard)                             |
|  Multi-project real-time monitoring              |
+--------------------------------------------------+

Layer 1 — CLI handles everything deterministic: scope validation, state transitions, baseline snapshots, evidence collection. It never calls an LLM. Guardrails don't depend on AI judgment.

Layer 2 — Runners bridge the CLI protocol to your AI tool of choice. Claude Code, Codex, Gemini, Antigravity, Copilot, Cursor — each speaks the same .4x/ file protocol but uses native platform capabilities.

Layer 3 — Live is the multi-project dashboard. Watch your AI agents work in real-time, see phase transitions, stream logs. REST + SSE API.

Installation

Homebrew (macOS / Linux)

brew install ggwhite/tap/fourx

Go Install

go install github.com/ggwhite/4x/cmd/4x@latest

Shell Script

curl -sSfL https://raw.githubusercontent.com/ggwhite/4x/main/install.sh | sh

Download Binary

Pre-built binaries for macOS, Linux, and Windows (amd64 / arm64) are available on the Releases page.

Quick Start

# Initialize in your project
cd my-project
4x init

# Create a feature
4x new "User authentication with OAuth2"
# => Created: F001-user-authentication-w

# Run the full loop
4x run F001 --runner claude

# Check status
4x status

# Review and complete
4x done F001

# Or watch it live
4x live -w

4x run drives the Design-Code-Review-Test loop automatically. If Review finds issues, Code gets another pass. If Test fails, the loop iterates. You stay in control with --max-rounds and --timeout flags.

The Four Roles

Role Job Outputs
Designer Analyze requirements, produce spec + acceptance criteria task-brief.md, acceptance-criteria.md
Coder Implement exactly what the spec says Source code, coder-report.md
Reviewer Catch bugs and spec violations (checklist + adversarial) review-report.md with verdict
Tester Validate against acceptance criteria with evidence test-report.md, verify.json

Each role is isolated. The Coder never sees the Reviewer's prior feedback. The Tester validates against criteria written by the Designer, not the Coder. This separation prevents the blind spots that plague single-agent workflows.

How the Loop Works

Designer → Coder → Reviewer → Tester → Accept → Pending Review → Done
                      ↓           ↓                                 ↑
                   amending ←─────┘                          human sign-off
  • Review failure (verdict FAIL or CRITICAL findings) sends code back for amending
  • Test failure (verify not passed) sends code back for amending
  • Escalation (spec mismatch, criteria wrong) routes back to Designer
  • Pending review gate ensures a human always reviews before marking done
  • Round budget (default 5) prevents infinite loops

Deterministic Guardrails

Enforced by the CLI, not AI judgment:

Guardrail What it does
Scope check Changed files must be within declared repos
Baseline snapshot Pre-coding state captured for safe rollback
State machine Phases must proceed in legal order
Evidence requirement Tester must provide verify.json with command output
Testing gate verify.json + test-report + final-report required
Dependency gate Features with unmet dependencies cannot start

Batch Mode

4x batch plan            # generate dependency-aware execution plan
4x batch run --runner claude  # run all eligible features in order
4x batch stop            # graceful shutdown after current feature

MCP Server

Start the Model Context Protocol (MCP) server:

4x mcp

Permission Model

4x runs AI agents in non-interactive mode. During 4x init, runners are configured with flags that skip permission prompts (--dangerously-skip-permissions, -y, approval: full-auto) so the loop runs autonomously.

The CLI's deterministic guardrails (scope lock, baseline snapshots, state machine) provide the safety boundary.

Run 4x only in projects where you are comfortable with autonomous AI agent execution.

Documentation

Document Description
User Guide Complete usage documentation
Getting Started Installation and first run
CLI Reference All commands and flags
Core Concepts Roles, state machine, protocol, guardrails
Configuration Settings, models, locale, runners
Runners & Plugins Supported runners and plugin contract
Dashboard 4x Live multi-project dashboard
Batch Mode Dependency-aware batch execution

Project Structure

4x/
  cmd/4x/              CLI entry point (Cobra)
  internal/
    protocol/           .4x/ file format, workspace, types
    state/              State machine (phase transitions)
    guard/              Guardrail checks (scope, baseline, evidence)
    batch/              Dependency DAG, batch scheduler
    runner/             Subprocess runner interface
    server/             SSE + REST server for Live dashboard
  plugins/
    claude-code/        Claude Code skill + workflow
    codex/              Codex runner instructions
    gemini/             Gemini runner instructions
    agy/                Antigravity runner instructions
    copilot/            Copilot runner instructions + workflow
    cursor/             Cursor rules
    embed.go            go:embed plugin files into binary
  dashboard/
    macos/              Swift native app (planned)
  docs/
    guide/              User documentation
    architecture/       System-level design docs
    design/             Mechanism design docs
    reference/          Plugin contract

FAQ

Q: Does 4x call any LLM APIs directly? No. The CLI is pure Go with zero LLM dependencies. Runners handle all AI interaction using their native platform capabilities.

Q: Can I use different LLMs for different roles? Yes. Configure per-role models in .4x/settings.json. Use Claude for Design, Gemini for Code — each reads the same .4x/ files.

Q: How is this different from Devin / SWE-agent / OpenHands? Those are autonomous agents that do everything in one shot. 4x is a framework that structures multi-role collaboration with deterministic guardrails. It's closer to a CI pipeline for AI than a single autonomous agent.

Origin Story

4x was born inside a production system called DCT (Designer-Coder-Tester) that shipped 60+ features for a large-scale platform rewrite. The patterns that survived — role isolation, file-based protocol, deterministic scope checking, evidence-based testing — became 4x. The parts that didn't survive — LLM-specific hacks, shared context assumptions, trust-based guardrails — were deliberately left out.

Contributing

git clone https://github.com/ggwhite/4x.git
cd 4x
go build ./cmd/4x
go test ./...

License

MIT


Stop hoping your AI writes correct code. Start verifying it.

About

Agentic AI development loop that splits Design, Code, Review, and Test into isolated roles with deterministic guardrails

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors