Skip to content

royalbit/forge-e2e

Repository files navigation

forge-e2e

📌 R&D Prototype — Interpret claims as hypotheses, not proven facts.

CI crates.io Coverage Tests License: Elastic-2.0

End-to-end validation suite for the Forge spreadsheet engine.

Overview

forge-e2e is an external validation framework for the forge spreadsheet engine. It validates forge's calculation accuracy against Gnumeric and R, achieving 100% pass rate across 1592 tests covering ~106 functions.

Why forge-e2e Exists

As forge has grown to support many Excel-compatible functions and enterprise features (Monte Carlo, Bayesian networks, decision trees), comprehensive testing became critical. forge-e2e exists as a separate project for several key reasons:

  1. External Validation: Validates forge against Gnumeric and R (independent engines) to ensure calculation accuracy
  2. Scale Management: forge's internal test suite is already large; separating E2E tests keeps each codebase focused
  3. Independent Testing: External validation provides confidence that forge matches Excel/Gnumeric behavior
  4. Enterprise Features: Dedicated testing for advanced statistical and financial features that require specialized R validation

Features

  • 100% Pass Rate: All 1592 tests pass with smart routing
  • Smart Test Routing: Automatically routes tests to the correct validator (Gnumeric, Forge, or R)
  • TUI Mode: Interactive terminal UI for running and managing tests
  • Parallel Execution: Uses rayon for fast parallel test execution
  • Multiple Validation Modes:
    • Gnumeric: Validates Excel-compatible functions
    • Forge: Validates forge-specific features (table references, LET, LAMBDA)
    • R: Validates statistical/analytics functions
  • 3219 E2E Tests: Comprehensive coverage across all function categories
  • 7 R Validators: Monte Carlo, Bootstrap, Tornado, Sensitivity, Decision Trees, Real Options, Bayesian

Validation Strategy

forge-e2e uses smart routing to automatically validate each test with the appropriate engine:

Smart Test Routing

When you run ./run-e2e.sh --all, tests are automatically classified and routed:

Tier Authority Tests Description
1a Gnumeric ~810 Excel-compatible (runtime validation)
1b Excel 365 ~780 Pre-computed expected values
2 R 78 Analytics & statistics (runtime validation)

Forge is NEVER the authority. Every test validates against an external source.

Pattern Detection: The script examines each test's formula content to detect features Gnumeric can't parse:

  • LET(), LAMBDA(), XLOOKUP(), SWITCH() - Excel 365+ functions
  • table.column syntax - Forge table references
  • BREAKEVEN_*, VARIANCE_* - Forge-native functions
  • DATEDIF(), boolean equality - Behavior differences

Tier 1a: Gnumeric (~120 functions)

  • Excel-compatible functions (SUM, IF, VLOOKUP, DATE, etc.)
  • Runtime validation via ssconvert
  • Ensures compatibility with standard spreadsheet behavior

Tier 1b: Pre-computed (~40 functions)

  • Excel 365+ functions (LET, LAMBDA, XLOOKUP, table references)
  • Expected values derived from Excel 365 (see ADR-006)
  • Tests features Gnumeric can't parse

Tier 2: R (~50 functions)

  • Monte Carlo simulations (MC.Normal, MC.Uniform, MC.PERT, etc.)
  • Bootstrap resampling and confidence intervals
  • Tornado and sensitivity analysis
  • Decision trees and real options valuation
  • Bayesian network inference
  • R validators located in validators/r/

Architecture

forge-e2e
├── Tier 1: Gnumeric ─── Excel-compatible (~120 functions)
└── Tier 2: R ─────────── Analytics & Statistics (~50 functions)
    ├── monte_carlo_validator.R    # MC distributions (mc2d)
    ├── bootstrap_validator.R      # Bootstrap CI (boot)
    ├── tornado_validator.R        # Tornado charts
    ├── sensitivity_validator.R    # Sensitivity analysis
    ├── decision_tree_validator.R  # EMV calculation (data.tree)
    ├── real_options_validator.R   # Black-Scholes (derivmkts)
    └── bayesian_validator.R       # Bayesian networks (bnlearn)

See ADR-002 for detailed validation engine documentation.

Validation Engines

  • Gnumeric: Primary validation for Excel-compatible functions (~120)
  • R: Statistical and analytics function validation (7 validators, 78 tests)
    • Packages: mc2d, boot, data.tree, derivmkts, bnlearn, sensitivity, jsonlite

Property-Based Fuzzing

forge-e2e includes a property-based fuzzing system that uses Gnumeric as an oracle to find bugs through differential testing.

Quick Start

# Fuzz a specific function
./scripts/fuzz_oracle.sh --function ABS --iterations 1000

# Fuzz with reproducible seed
./scripts/fuzz_oracle.sh --seed 12345 --iterations 500

# Run interactive demo
./examples/fuzzing_demo.sh

How It Works

  1. Generate random inputs (numbers, booleans, edge cases)
  2. Run through both engines (forge and Gnumeric)
  3. Compare results (differential testing)
  4. Report mismatches as potential bugs

Supported Functions

  • Math: ABS, SQRT, POWER, MOD, ROUND, FLOOR, CEILING
  • Aggregation: SUM, AVERAGE, MIN, MAX
  • Logical: IF, AND, OR

Generating Reports

# Auto-detect latest bug file
./scripts/fuzz_report.sh --summary

# Generate markdown report
./scripts/fuzz_report.sh --format markdown --output bug_report.md

# Generate JSON report
./scripts/fuzz_report.sh --format json --output bugs.json

Documentation

See ADR-007: Fuzzing Strategy for:

  • Input generation strategies
  • Which functions to fuzz
  • How to interpret results
  • Reproducibility with seeds
  • Future enhancements

Test Structure

tests/e2e/
├── functions/         # 113 function tests across 6 categories
│   ├── math.yaml      # 50 tests: ABS, SQRT, POWER, SIN, COS, etc.
│   ├── date.yaml      # 12 tests: DATE, DATEVALUE, DAY, MONTH, YEAR, etc.
│   ├── text.yaml      # 14 tests: CONCAT, LEFT, RIGHT, TRIM, UPPER, etc.
│   ├── logical.yaml   # 11 tests: AND, OR, NOT, IF, etc.
│   ├── lookup.yaml    # 3 tests: VLOOKUP, INDEX, MATCH
│   └── aggregation.yaml # 23 tests: SUM, AVERAGE, COUNT, MIN, MAX, etc.
│
├── edge/              # 120 edge case tests across 8 categories
│   ├── edge_type_coercion.yaml  # 8 tests: boolean arithmetic, TRUE/FALSE coercion
│   ├── edge_arithmetic.yaml     # 13 tests: MOD edge cases, power operators
│   ├── edge_comparison.yaml     # 13 tests: equality, inequality edge cases
│   ├── edge_string_ops.yaml     # 16 tests: string concatenation, edge cases
│   ├── edge_errors.yaml         # 12 tests: #DIV/0!, #VALUE!, #REF! handling
│   ├── edge_numeric.yaml        # 20 tests: precision, overflow, underflow
│   ├── edge_dates.yaml          # 18 tests: leap years, date arithmetic
│   └── edge_logical_agg.yaml    # 20 tests: aggregate function edge cases
│
└── enterprise/        # Enterprise features (planned)
    ├── monte_carlo.yaml    # Monte Carlo simulations
    ├── scenarios.yaml      # Scenario analysis
    ├── decision_trees.yaml # Decision tree analysis
    ├── real_options.yaml   # Real options valuation
    ├── bootstrap.yaml      # Bootstrap resampling
    └── bayesian.yaml       # Bayesian networks

Total: 1592 tests validating ~106 functions (100% pass rate)

Usage

Quick Start

# Run all tests (recommended for first-time setup)
./run-e2e.sh --all

# Run in interactive TUI mode
./run-e2e.sh

Specifying Forge Binary Location

The run script will automatically search for the forge binary in this order:

  1. --forge-path argument
  2. FORGE_BIN environment variable
  3. ../forge/target/release/forge
  4. System PATH
# Option 1: Use --forge-path argument
./run-e2e.sh --forge-path /path/to/forge --all

# Option 2: Set environment variable
export FORGE_BIN=/path/to/forge
./run-e2e.sh --all

# Option 3: Build forge locally (recommended)
cd ../forge && cargo build --release
cd ../forge-e2e && ./run-e2e.sh --all

Running Tests

# Smart routing mode - automatically routes tests to correct validator (recommended)
./run-e2e.sh --all

# Interactive TUI mode
./run-e2e.sh

# Force specific validation mode (advanced)
./run-e2e.sh --gnumeric    # Only Gnumeric validation
./run-e2e.sh --forge       # Only Forge validation
./run-e2e.sh --r           # Only R validators

# Specify custom test directory
./run-e2e.sh --tests /path/to/tests --all

# Direct binary usage (advanced)
cargo build --release
export FORGE_BIN=/path/to/forge
./target/release/forge-e2e --all

How to Add New Tests

1. Adding Function Tests

Create or edit a YAML file in tests/e2e/functions/:

_forge_version: 1.0.0
assumptions:
  test_function_name:
    value: 42.0
    formula: =FUNCTION(args)
    expected: 42

Example: Adding a SUMIF test to aggregation.yaml:

  test_sumif_basic:
    value: 15.0
    formula: =SUMIF(A1:A5, ">10", B1:B5)
    expected: 15

2. Adding Edge Case Tests

Create or edit a YAML file in tests/e2e/edge/ with descriptive edge case scenarios:

_forge_version: 1.0.0
assumptions:
  test_division_by_zero:
    value: "#DIV/0!"
    formula: =1/0
    expected: "#DIV/0!"

  test_overflow_handling:
    value: 1.79769e+308
    formula: =POWER(10, 308)
    expected: 1.79769e+308

3. Adding Roundtrip Validation

Roundtrip tests validate that forge can:

  1. Export to XLSX format
  2. Re-import the XLSX file
  3. Preserve all values and formulas accurately

See src/runner.rs for roundtrip test examples (currently in unit tests, will be moved to integration tests).

4. Test File Naming Conventions

  • Function tests: {category}.yaml (e.g., math.yaml, financial.yaml)
  • Edge tests: edge_{category}.yaml (e.g., edge_arithmetic.yaml)
  • Enterprise tests: {feature}.yaml (e.g., monte_carlo.yaml)

5. YAML Test Format

Every test file must follow this structure:

_forge_version: 1.0.0
assumptions:
  test_descriptive_name:
    value: <expected_value>
    formula: =FORMULA()
    expected: <expected_value>

Key points:

  • _forge_version: Schema version (currently 1.0.0)
  • assumptions: Map of test cases
  • test_*: Test names must start with "test_"
  • value: Expected result (can be number, string, or error like "#DIV/0!")
  • formula: The Excel-compatible formula to test
  • expected: Expected result (should match value)

See docs/CONTRIBUTING.md for detailed contribution guidelines.

Test Results

When running ./run-e2e.sh --all, you'll see colored output with smart routing results:

═══════════════════════════════════════════════════════════════════════
  COMBINED RESULTS (Smart Routing)
═══════════════════════════════════════════════════════════════════════

  Tier 1a (Gnumeric):   810 passed, 0 failed
  Tier 1b (Expected):   782 passed, 0 failed
  Tier 2  (R):           78 passed, 0 failed

  Total:               1670 passed, 0 failed
═══════════════════════════════════════════════════════════════════════

Smart routing ensures 100% pass rate by sending each test to the appropriate validator.

Requirements

System Dependencies

  • forge: Enterprise spreadsheet engine (proprietary)

    • Build from source: cd ../forge && cargo build --release
    • Or specify path: ./run-e2e.sh --forge-path /path/to/forge
  • Gnumeric: Primary validation engine (Excel function accuracy)

    # macOS
    brew install gnumeric
    
    # Ubuntu/Debian
    apt install gnumeric

R Setup

Required for Tier 2 validation (statistical distributions, bootstrap, Monte Carlo, financial analytics).

macOS:

brew install r
R -e 'install.packages("boot")'

Ubuntu/Debian:

sudo apt install r-base
R -e 'install.packages("boot")'

See ADR-002 for detailed validation engine documentation.

Running R Validators

Run all R validators:

./validators/r/run_all.sh

Or run individual validators:

Rscript validators/r/bootstrap_validator.R
Rscript validators/r/distribution_validator.R
Rscript validators/r/financial_validator.R
Rscript validators/r/monte_carlo_validator.R

Rust Dependencies

See Cargo.toml for full dependency list. Key dependencies:

  • ratatui: Terminal UI framework
  • rayon: Parallel test execution
  • serde_yaml_ng: YAML test file parsing
  • calamine / rust_xlsxwriter: Excel file handling

Development

Building

# Debug build
cargo build

# Release build (recommended)
cargo build --release

# Run tests
cargo test

Project Structure

forge-e2e/
├── src/
│   ├── main.rs       # CLI entry point
│   ├── engine.rs     # Spreadsheet engine abstraction (Gnumeric)
│   ├── runner.rs     # Test execution and validation
│   ├── excel.rs      # XLSX file handling
│   ├── types.rs      # Test types and schemas
│   └── tui/          # Terminal UI components
├── tests/e2e/        # Test specifications
├── docs/             # Documentation
├── run-e2e.sh        # Convenience script
└── Cargo.toml        # Rust project configuration

Roadmap

Current version: 9.10.0 - 100% E2E coverage achieved!

See .asimov/roadmap.yaml for detailed roadmap.

Completed (9.10.0)

  • Smart test routing (100% pass rate)
  • 7 R validators for analytics
  • 1592 tests across 3 validation tiers
  • ADR-009: Smart Routing documentation

Next (9.11.0)

  • MCP/API E2E tests
  • REST API endpoint validation
  • JSON-RPC MCP tool validation
  • OpenAPI spec compliance tests

Contributing

See docs/CONTRIBUTING.md for detailed guidelines on:

  • Adding function tests
  • Adding edge case tests
  • Test naming conventions
  • YAML format specifications

License

Elastic License 2.0 - See LICENSE

Forge-e2e is Source Available - the code is open for inspection, but commercial production use requires a license. See the forge repository for licensing details.

About

E2E test suite for Forge - validates formulas against Gnumeric and R

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •