forge-e2e

📌 R&D Prototype — Interpret claims as hypotheses, not proven facts.

End-to-end validation suite for the Forge spreadsheet engine.

Overview

forge-e2e is an external validation framework for the forge spreadsheet engine. It validates forge's calculation accuracy against Gnumeric and R, achieving 100% pass rate across 1592 tests covering ~106 functions.

Why forge-e2e Exists

As forge has grown to support many Excel-compatible functions and enterprise features (Monte Carlo, Bayesian networks, decision trees), comprehensive testing became critical. forge-e2e exists as a separate project for several key reasons:

External Validation: Validates forge against Gnumeric and R (independent engines) to ensure calculation accuracy
Scale Management: forge's internal test suite is already large; separating E2E tests keeps each codebase focused
Independent Testing: External validation provides confidence that forge matches Excel/Gnumeric behavior
Enterprise Features: Dedicated testing for advanced statistical and financial features that require specialized R validation

Features

100% Pass Rate: All 1592 tests pass with smart routing
Smart Test Routing: Automatically routes tests to the correct validator (Gnumeric, Forge, or R)
TUI Mode: Interactive terminal UI for running and managing tests
Parallel Execution: Uses rayon for fast parallel test execution
Multiple Validation Modes:
- Gnumeric: Validates Excel-compatible functions
- Forge: Validates forge-specific features (table references, LET, LAMBDA)
- R: Validates statistical/analytics functions
3219 E2E Tests: Comprehensive coverage across all function categories
7 R Validators: Monte Carlo, Bootstrap, Tornado, Sensitivity, Decision Trees, Real Options, Bayesian

Validation Strategy

forge-e2e uses smart routing to automatically validate each test with the appropriate engine:

Smart Test Routing

When you run ./run-e2e.sh --all, tests are automatically classified and routed:

Tier	Authority	Tests	Description
1a	Gnumeric	~810	Excel-compatible (runtime validation)
1b	Excel 365	~780	Pre-computed expected values
2	R	78	Analytics & statistics (runtime validation)

Forge is NEVER the authority. Every test validates against an external source.

Pattern Detection: The script examines each test's formula content to detect features Gnumeric can't parse:

LET(), LAMBDA(), XLOOKUP(), SWITCH() - Excel 365+ functions
table.column syntax - Forge table references
BREAKEVEN_*, VARIANCE_* - Forge-native functions
DATEDIF(), boolean equality - Behavior differences

Tier 1a: Gnumeric (~120 functions)

Excel-compatible functions (SUM, IF, VLOOKUP, DATE, etc.)
Runtime validation via ssconvert
Ensures compatibility with standard spreadsheet behavior

Tier 1b: Pre-computed (~40 functions)

Excel 365+ functions (LET, LAMBDA, XLOOKUP, table references)
Expected values derived from Excel 365 (see ADR-006)
Tests features Gnumeric can't parse

Tier 2: R (~50 functions)

Monte Carlo simulations (MC.Normal, MC.Uniform, MC.PERT, etc.)
Bootstrap resampling and confidence intervals
Tornado and sensitivity analysis
Decision trees and real options valuation
Bayesian network inference
R validators located in validators/r/

Architecture

forge-e2e
├── Tier 1: Gnumeric ─── Excel-compatible (~120 functions)
└── Tier 2: R ─────────── Analytics & Statistics (~50 functions)
    ├── monte_carlo_validator.R    # MC distributions (mc2d)
    ├── bootstrap_validator.R      # Bootstrap CI (boot)
    ├── tornado_validator.R        # Tornado charts
    ├── sensitivity_validator.R    # Sensitivity analysis
    ├── decision_tree_validator.R  # EMV calculation (data.tree)
    ├── real_options_validator.R   # Black-Scholes (derivmkts)
    └── bayesian_validator.R       # Bayesian networks (bnlearn)

See ADR-002 for detailed validation engine documentation.

Validation Engines

Gnumeric: Primary validation for Excel-compatible functions (~120)
R: Statistical and analytics function validation (7 validators, 78 tests)
- Packages: mc2d, boot, data.tree, derivmkts, bnlearn, sensitivity, jsonlite

Property-Based Fuzzing

forge-e2e includes a property-based fuzzing system that uses Gnumeric as an oracle to find bugs through differential testing.

Quick Start

# Fuzz a specific function
./scripts/fuzz_oracle.sh --function ABS --iterations 1000

# Fuzz with reproducible seed
./scripts/fuzz_oracle.sh --seed 12345 --iterations 500

# Run interactive demo
./examples/fuzzing_demo.sh

How It Works

Generate random inputs (numbers, booleans, edge cases)
Run through both engines (forge and Gnumeric)
Compare results (differential testing)
Report mismatches as potential bugs

Supported Functions

Math: ABS, SQRT, POWER, MOD, ROUND, FLOOR, CEILING
Aggregation: SUM, AVERAGE, MIN, MAX
Logical: IF, AND, OR

Generating Reports

# Auto-detect latest bug file
./scripts/fuzz_report.sh --summary

# Generate markdown report
./scripts/fuzz_report.sh --format markdown --output bug_report.md

# Generate JSON report
./scripts/fuzz_report.sh --format json --output bugs.json

Documentation

See ADR-007: Fuzzing Strategy for:

Input generation strategies
Which functions to fuzz
How to interpret results
Reproducibility with seeds
Future enhancements

Test Structure

tests/e2e/
├── functions/         # 113 function tests across 6 categories
│   ├── math.yaml      # 50 tests: ABS, SQRT, POWER, SIN, COS, etc.
│   ├── date.yaml      # 12 tests: DATE, DATEVALUE, DAY, MONTH, YEAR, etc.
│   ├── text.yaml      # 14 tests: CONCAT, LEFT, RIGHT, TRIM, UPPER, etc.
│   ├── logical.yaml   # 11 tests: AND, OR, NOT, IF, etc.
│   ├── lookup.yaml    # 3 tests: VLOOKUP, INDEX, MATCH
│   └── aggregation.yaml # 23 tests: SUM, AVERAGE, COUNT, MIN, MAX, etc.
│
├── edge/              # 120 edge case tests across 8 categories
│   ├── edge_type_coercion.yaml  # 8 tests: boolean arithmetic, TRUE/FALSE coercion
│   ├── edge_arithmetic.yaml     # 13 tests: MOD edge cases, power operators
│   ├── edge_comparison.yaml     # 13 tests: equality, inequality edge cases
│   ├── edge_string_ops.yaml     # 16 tests: string concatenation, edge cases
│   ├── edge_errors.yaml         # 12 tests: #DIV/0!, #VALUE!, #REF! handling
│   ├── edge_numeric.yaml        # 20 tests: precision, overflow, underflow
│   ├── edge_dates.yaml          # 18 tests: leap years, date arithmetic
│   └── edge_logical_agg.yaml    # 20 tests: aggregate function edge cases
│
└── enterprise/        # Enterprise features (planned)
    ├── monte_carlo.yaml    # Monte Carlo simulations
    ├── scenarios.yaml      # Scenario analysis
    ├── decision_trees.yaml # Decision tree analysis
    ├── real_options.yaml   # Real options valuation
    ├── bootstrap.yaml      # Bootstrap resampling
    └── bayesian.yaml       # Bayesian networks

Total: 1592 tests validating ~106 functions (100% pass rate)

Usage

Quick Start

# Run all tests (recommended for first-time setup)
./run-e2e.sh --all

# Run in interactive TUI mode
./run-e2e.sh

Specifying Forge Binary Location

The run script will automatically search for the forge binary in this order:

--forge-path argument
FORGE_BIN environment variable
../forge/target/release/forge
System PATH

# Option 1: Use --forge-path argument
./run-e2e.sh --forge-path /path/to/forge --all

# Option 2: Set environment variable
export FORGE_BIN=/path/to/forge
./run-e2e.sh --all

# Option 3: Build forge locally (recommended)
cd ../forge && cargo build --release
cd ../forge-e2e && ./run-e2e.sh --all

Running Tests

# Smart routing mode - automatically routes tests to correct validator (recommended)
./run-e2e.sh --all

# Interactive TUI mode
./run-e2e.sh

# Force specific validation mode (advanced)
./run-e2e.sh --gnumeric    # Only Gnumeric validation
./run-e2e.sh --forge       # Only Forge validation
./run-e2e.sh --r           # Only R validators

# Specify custom test directory
./run-e2e.sh --tests /path/to/tests --all

# Direct binary usage (advanced)
cargo build --release
export FORGE_BIN=/path/to/forge
./target/release/forge-e2e --all

How to Add New Tests

1. Adding Function Tests

Create or edit a YAML file in tests/e2e/functions/:

_forge_version: 1.0.0
assumptions:
  test_function_name:
    value: 42.0
    formula: =FUNCTION(args)
    expected: 42

Example: Adding a SUMIF test to aggregation.yaml:

  test_sumif_basic:
    value: 15.0
    formula: =SUMIF(A1:A5, ">10", B1:B5)
    expected: 15

2. Adding Edge Case Tests

Create or edit a YAML file in tests/e2e/edge/ with descriptive edge case scenarios:

_forge_version: 1.0.0
assumptions:
  test_division_by_zero:
    value: "#DIV/0!"
    formula: =1/0
    expected: "#DIV/0!"

  test_overflow_handling:
    value: 1.79769e+308
    formula: =POWER(10, 308)
    expected: 1.79769e+308

3. Adding Roundtrip Validation

Roundtrip tests validate that forge can:

Export to XLSX format
Re-import the XLSX file
Preserve all values and formulas accurately

See src/runner.rs for roundtrip test examples (currently in unit tests, will be moved to integration tests).

4. Test File Naming Conventions

Function tests: {category}.yaml (e.g., math.yaml, financial.yaml)
Edge tests: edge_{category}.yaml (e.g., edge_arithmetic.yaml)
Enterprise tests: {feature}.yaml (e.g., monte_carlo.yaml)

5. YAML Test Format

Every test file must follow this structure:

_forge_version: 1.0.0
assumptions:
  test_descriptive_name:
    value: <expected_value>
    formula: =FORMULA()
    expected: <expected_value>

Key points:

_forge_version: Schema version (currently 1.0.0)
assumptions: Map of test cases
test_*: Test names must start with "test_"
value: Expected result (can be number, string, or error like "#DIV/0!")
formula: The Excel-compatible formula to test
expected: Expected result (should match value)

See docs/CONTRIBUTING.md for detailed contribution guidelines.

Test Results

When running ./run-e2e.sh --all, you'll see colored output with smart routing results:

═══════════════════════════════════════════════════════════════════════
  COMBINED RESULTS (Smart Routing)
═══════════════════════════════════════════════════════════════════════

  Tier 1a (Gnumeric):   810 passed, 0 failed
  Tier 1b (Expected):   782 passed, 0 failed
  Tier 2  (R):           78 passed, 0 failed

  Total:               1670 passed, 0 failed
═══════════════════════════════════════════════════════════════════════

Smart routing ensures 100% pass rate by sending each test to the appropriate validator.

Requirements

System Dependencies

forge: Enterprise spreadsheet engine (proprietary)
- Build from source: cd ../forge && cargo build --release
- Or specify path: ./run-e2e.sh --forge-path /path/to/forge

Gnumeric: Primary validation engine (Excel function accuracy)

# macOS
brew install gnumeric

# Ubuntu/Debian
apt install gnumeric

R Setup

Required for Tier 2 validation (statistical distributions, bootstrap, Monte Carlo, financial analytics).

macOS:

brew install r
R -e 'install.packages("boot")'

Ubuntu/Debian:

sudo apt install r-base
R -e 'install.packages("boot")'

See ADR-002 for detailed validation engine documentation.

Running R Validators

Run all R validators:

./validators/r/run_all.sh

Or run individual validators:

Rscript validators/r/bootstrap_validator.R
Rscript validators/r/distribution_validator.R
Rscript validators/r/financial_validator.R
Rscript validators/r/monte_carlo_validator.R

Rust Dependencies

See Cargo.toml for full dependency list. Key dependencies:

ratatui: Terminal UI framework
rayon: Parallel test execution
serde_yaml_ng: YAML test file parsing
calamine / rust_xlsxwriter: Excel file handling

Development

Building

# Debug build
cargo build

# Release build (recommended)
cargo build --release

# Run tests
cargo test

Project Structure

forge-e2e/
├── src/
│   ├── main.rs       # CLI entry point
│   ├── engine.rs     # Spreadsheet engine abstraction (Gnumeric)
│   ├── runner.rs     # Test execution and validation
│   ├── excel.rs      # XLSX file handling
│   ├── types.rs      # Test types and schemas
│   └── tui/          # Terminal UI components
├── tests/e2e/        # Test specifications
├── docs/             # Documentation
├── run-e2e.sh        # Convenience script
└── Cargo.toml        # Rust project configuration

Roadmap

Current version: 9.10.0 - 100% E2E coverage achieved!

See .asimov/roadmap.yaml for detailed roadmap.

Completed (9.10.0)

Smart test routing (100% pass rate)
7 R validators for analytics
1592 tests across 3 validation tiers
ADR-009: Smart Routing documentation

Next (9.11.0)

MCP/API E2E tests
REST API endpoint validation
JSON-RPC MCP tool validation
OpenAPI spec compliance tests

Contributing

See docs/CONTRIBUTING.md for detailed guidelines on:

Adding function tests
Adding edge case tests
Test naming conventions
YAML format specifications

License

Elastic License 2.0 - See LICENSE

Forge-e2e is Source Available - the code is open for inspection, but commercial production use requires a license. See the forge repository for licensing details.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.asimov		.asimov
.claude		.claude
.github/workflows		.github/workflows
baselines		baselines
coverage		coverage
docs		docs
examples		examples
fuzz_results		fuzz_results
scripts		scripts
src		src
tests		tests
validators/r		validators/r
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
MIGRATION.md		MIGRATION.md
README.md		README.md
run-e2e.sh		run-e2e.sh

License

royalbit/forge-e2e

Folders and files

Latest commit

History

Repository files navigation

forge-e2e

Overview

Why forge-e2e Exists

Features

Validation Strategy

Smart Test Routing

Tier 1a: Gnumeric (~120 functions)

Tier 1b: Pre-computed (~40 functions)

Tier 2: R (~50 functions)

Architecture

Validation Engines

Property-Based Fuzzing

Quick Start

How It Works

Supported Functions

Generating Reports

Documentation

Test Structure

Usage

Quick Start

Specifying Forge Binary Location

Running Tests

How to Add New Tests

1. Adding Function Tests

2. Adding Edge Case Tests

3. Adding Roundtrip Validation

4. Test File Naming Conventions

5. YAML Test Format

Test Results

Requirements

System Dependencies

R Setup

Running R Validators

Rust Dependencies

Development

Building

Project Structure

Roadmap

Completed (9.10.0)

Next (9.11.0)

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages