📌 R&D Prototype — Interpret claims as hypotheses, not proven facts.
End-to-end validation suite for the Forge spreadsheet engine.
forge-e2e is an external validation framework for the forge spreadsheet engine. It validates forge's calculation accuracy against Gnumeric and R, achieving 100% pass rate across 1592 tests covering ~106 functions.
As forge has grown to support many Excel-compatible functions and enterprise features (Monte Carlo, Bayesian networks, decision trees), comprehensive testing became critical. forge-e2e exists as a separate project for several key reasons:
- External Validation: Validates forge against Gnumeric and R (independent engines) to ensure calculation accuracy
- Scale Management: forge's internal test suite is already large; separating E2E tests keeps each codebase focused
- Independent Testing: External validation provides confidence that forge matches Excel/Gnumeric behavior
- Enterprise Features: Dedicated testing for advanced statistical and financial features that require specialized R validation
- 100% Pass Rate: All 1592 tests pass with smart routing
- Smart Test Routing: Automatically routes tests to the correct validator (Gnumeric, Forge, or R)
- TUI Mode: Interactive terminal UI for running and managing tests
- Parallel Execution: Uses rayon for fast parallel test execution
- Multiple Validation Modes:
- Gnumeric: Validates Excel-compatible functions
- Forge: Validates forge-specific features (table references, LET, LAMBDA)
- R: Validates statistical/analytics functions
- 3219 E2E Tests: Comprehensive coverage across all function categories
- 7 R Validators: Monte Carlo, Bootstrap, Tornado, Sensitivity, Decision Trees, Real Options, Bayesian
forge-e2e uses smart routing to automatically validate each test with the appropriate engine:
When you run ./run-e2e.sh --all, tests are automatically classified and routed:
| Tier | Authority | Tests | Description |
|---|---|---|---|
| 1a | Gnumeric | ~810 | Excel-compatible (runtime validation) |
| 1b | Excel 365 | ~780 | Pre-computed expected values |
| 2 | R | 78 | Analytics & statistics (runtime validation) |
Forge is NEVER the authority. Every test validates against an external source.
Pattern Detection: The script examines each test's formula content to detect features Gnumeric can't parse:
LET(),LAMBDA(),XLOOKUP(),SWITCH()- Excel 365+ functionstable.columnsyntax - Forge table referencesBREAKEVEN_*,VARIANCE_*- Forge-native functionsDATEDIF(), boolean equality - Behavior differences
- Excel-compatible functions (SUM, IF, VLOOKUP, DATE, etc.)
- Runtime validation via ssconvert
- Ensures compatibility with standard spreadsheet behavior
- Excel 365+ functions (LET, LAMBDA, XLOOKUP, table references)
- Expected values derived from Excel 365 (see ADR-006)
- Tests features Gnumeric can't parse
- Monte Carlo simulations (MC.Normal, MC.Uniform, MC.PERT, etc.)
- Bootstrap resampling and confidence intervals
- Tornado and sensitivity analysis
- Decision trees and real options valuation
- Bayesian network inference
- R validators located in
validators/r/
forge-e2e
├── Tier 1: Gnumeric ─── Excel-compatible (~120 functions)
└── Tier 2: R ─────────── Analytics & Statistics (~50 functions)
├── monte_carlo_validator.R # MC distributions (mc2d)
├── bootstrap_validator.R # Bootstrap CI (boot)
├── tornado_validator.R # Tornado charts
├── sensitivity_validator.R # Sensitivity analysis
├── decision_tree_validator.R # EMV calculation (data.tree)
├── real_options_validator.R # Black-Scholes (derivmkts)
└── bayesian_validator.R # Bayesian networks (bnlearn)
See ADR-002 for detailed validation engine documentation.
- Gnumeric: Primary validation for Excel-compatible functions (~120)
- R: Statistical and analytics function validation (7 validators, 78 tests)
- Packages: mc2d, boot, data.tree, derivmkts, bnlearn, sensitivity, jsonlite
forge-e2e includes a property-based fuzzing system that uses Gnumeric as an oracle to find bugs through differential testing.
# Fuzz a specific function
./scripts/fuzz_oracle.sh --function ABS --iterations 1000
# Fuzz with reproducible seed
./scripts/fuzz_oracle.sh --seed 12345 --iterations 500
# Run interactive demo
./examples/fuzzing_demo.sh- Generate random inputs (numbers, booleans, edge cases)
- Run through both engines (forge and Gnumeric)
- Compare results (differential testing)
- Report mismatches as potential bugs
- Math: ABS, SQRT, POWER, MOD, ROUND, FLOOR, CEILING
- Aggregation: SUM, AVERAGE, MIN, MAX
- Logical: IF, AND, OR
# Auto-detect latest bug file
./scripts/fuzz_report.sh --summary
# Generate markdown report
./scripts/fuzz_report.sh --format markdown --output bug_report.md
# Generate JSON report
./scripts/fuzz_report.sh --format json --output bugs.jsonSee ADR-007: Fuzzing Strategy for:
- Input generation strategies
- Which functions to fuzz
- How to interpret results
- Reproducibility with seeds
- Future enhancements
tests/e2e/
├── functions/ # 113 function tests across 6 categories
│ ├── math.yaml # 50 tests: ABS, SQRT, POWER, SIN, COS, etc.
│ ├── date.yaml # 12 tests: DATE, DATEVALUE, DAY, MONTH, YEAR, etc.
│ ├── text.yaml # 14 tests: CONCAT, LEFT, RIGHT, TRIM, UPPER, etc.
│ ├── logical.yaml # 11 tests: AND, OR, NOT, IF, etc.
│ ├── lookup.yaml # 3 tests: VLOOKUP, INDEX, MATCH
│ └── aggregation.yaml # 23 tests: SUM, AVERAGE, COUNT, MIN, MAX, etc.
│
├── edge/ # 120 edge case tests across 8 categories
│ ├── edge_type_coercion.yaml # 8 tests: boolean arithmetic, TRUE/FALSE coercion
│ ├── edge_arithmetic.yaml # 13 tests: MOD edge cases, power operators
│ ├── edge_comparison.yaml # 13 tests: equality, inequality edge cases
│ ├── edge_string_ops.yaml # 16 tests: string concatenation, edge cases
│ ├── edge_errors.yaml # 12 tests: #DIV/0!, #VALUE!, #REF! handling
│ ├── edge_numeric.yaml # 20 tests: precision, overflow, underflow
│ ├── edge_dates.yaml # 18 tests: leap years, date arithmetic
│ └── edge_logical_agg.yaml # 20 tests: aggregate function edge cases
│
└── enterprise/ # Enterprise features (planned)
├── monte_carlo.yaml # Monte Carlo simulations
├── scenarios.yaml # Scenario analysis
├── decision_trees.yaml # Decision tree analysis
├── real_options.yaml # Real options valuation
├── bootstrap.yaml # Bootstrap resampling
└── bayesian.yaml # Bayesian networks
Total: 1592 tests validating ~106 functions (100% pass rate)
# Run all tests (recommended for first-time setup)
./run-e2e.sh --all
# Run in interactive TUI mode
./run-e2e.shThe run script will automatically search for the forge binary in this order:
--forge-pathargumentFORGE_BINenvironment variable../forge/target/release/forge- System PATH
# Option 1: Use --forge-path argument
./run-e2e.sh --forge-path /path/to/forge --all
# Option 2: Set environment variable
export FORGE_BIN=/path/to/forge
./run-e2e.sh --all
# Option 3: Build forge locally (recommended)
cd ../forge && cargo build --release
cd ../forge-e2e && ./run-e2e.sh --all# Smart routing mode - automatically routes tests to correct validator (recommended)
./run-e2e.sh --all
# Interactive TUI mode
./run-e2e.sh
# Force specific validation mode (advanced)
./run-e2e.sh --gnumeric # Only Gnumeric validation
./run-e2e.sh --forge # Only Forge validation
./run-e2e.sh --r # Only R validators
# Specify custom test directory
./run-e2e.sh --tests /path/to/tests --all
# Direct binary usage (advanced)
cargo build --release
export FORGE_BIN=/path/to/forge
./target/release/forge-e2e --allCreate or edit a YAML file in tests/e2e/functions/:
_forge_version: 1.0.0
assumptions:
test_function_name:
value: 42.0
formula: =FUNCTION(args)
expected: 42Example: Adding a SUMIF test to aggregation.yaml:
test_sumif_basic:
value: 15.0
formula: =SUMIF(A1:A5, ">10", B1:B5)
expected: 15Create or edit a YAML file in tests/e2e/edge/ with descriptive edge case scenarios:
_forge_version: 1.0.0
assumptions:
test_division_by_zero:
value: "#DIV/0!"
formula: =1/0
expected: "#DIV/0!"
test_overflow_handling:
value: 1.79769e+308
formula: =POWER(10, 308)
expected: 1.79769e+308Roundtrip tests validate that forge can:
- Export to XLSX format
- Re-import the XLSX file
- Preserve all values and formulas accurately
See src/runner.rs for roundtrip test examples (currently in unit tests, will be moved to integration tests).
- Function tests:
{category}.yaml(e.g.,math.yaml,financial.yaml) - Edge tests:
edge_{category}.yaml(e.g.,edge_arithmetic.yaml) - Enterprise tests:
{feature}.yaml(e.g.,monte_carlo.yaml)
Every test file must follow this structure:
_forge_version: 1.0.0
assumptions:
test_descriptive_name:
value: <expected_value>
formula: =FORMULA()
expected: <expected_value>Key points:
_forge_version: Schema version (currently 1.0.0)assumptions: Map of test casestest_*: Test names must start with "test_"value: Expected result (can be number, string, or error like "#DIV/0!")formula: The Excel-compatible formula to testexpected: Expected result (should matchvalue)
See docs/CONTRIBUTING.md for detailed contribution guidelines.
When running ./run-e2e.sh --all, you'll see colored output with smart routing results:
═══════════════════════════════════════════════════════════════════════
COMBINED RESULTS (Smart Routing)
═══════════════════════════════════════════════════════════════════════
Tier 1a (Gnumeric): 810 passed, 0 failed
Tier 1b (Expected): 782 passed, 0 failed
Tier 2 (R): 78 passed, 0 failed
Total: 1670 passed, 0 failed
═══════════════════════════════════════════════════════════════════════
Smart routing ensures 100% pass rate by sending each test to the appropriate validator.
-
forge: Enterprise spreadsheet engine (proprietary)
- Build from source:
cd ../forge && cargo build --release - Or specify path:
./run-e2e.sh --forge-path /path/to/forge
- Build from source:
-
Gnumeric: Primary validation engine (Excel function accuracy)
# macOS brew install gnumeric # Ubuntu/Debian apt install gnumeric
Required for Tier 2 validation (statistical distributions, bootstrap, Monte Carlo, financial analytics).
macOS:
brew install r
R -e 'install.packages("boot")'Ubuntu/Debian:
sudo apt install r-base
R -e 'install.packages("boot")'See ADR-002 for detailed validation engine documentation.
Run all R validators:
./validators/r/run_all.shOr run individual validators:
Rscript validators/r/bootstrap_validator.R
Rscript validators/r/distribution_validator.R
Rscript validators/r/financial_validator.R
Rscript validators/r/monte_carlo_validator.RSee Cargo.toml for full dependency list. Key dependencies:
ratatui: Terminal UI frameworkrayon: Parallel test executionserde_yaml_ng: YAML test file parsingcalamine/rust_xlsxwriter: Excel file handling
# Debug build
cargo build
# Release build (recommended)
cargo build --release
# Run tests
cargo testforge-e2e/
├── src/
│ ├── main.rs # CLI entry point
│ ├── engine.rs # Spreadsheet engine abstraction (Gnumeric)
│ ├── runner.rs # Test execution and validation
│ ├── excel.rs # XLSX file handling
│ ├── types.rs # Test types and schemas
│ └── tui/ # Terminal UI components
├── tests/e2e/ # Test specifications
├── docs/ # Documentation
├── run-e2e.sh # Convenience script
└── Cargo.toml # Rust project configuration
Current version: 9.10.0 - 100% E2E coverage achieved!
See .asimov/roadmap.yaml for detailed roadmap.
- Smart test routing (100% pass rate)
- 7 R validators for analytics
- 1592 tests across 3 validation tiers
- ADR-009: Smart Routing documentation
- MCP/API E2E tests
- REST API endpoint validation
- JSON-RPC MCP tool validation
- OpenAPI spec compliance tests
See docs/CONTRIBUTING.md for detailed guidelines on:
- Adding function tests
- Adding edge case tests
- Test naming conventions
- YAML format specifications
Elastic License 2.0 - See LICENSE
Forge-e2e is Source Available - the code is open for inspection, but commercial production use requires a license. See the forge repository for licensing details.