Skip to content

Fail closed on checkpoint schema validation before load #6938

@davidahmann

Description

Problem

Invalid checkpoint payloads can reach load paths without a strict fail-closed schema gate, creating undefined runtime behavior and brittle recovery semantics.

Why now

LangGraph durability depends on trustworthy checkpoint restoration; accepting malformed checkpoint state undermines replay correctness.

Evidence Packet

  • Version/commit under test: origin/main at 48167d7fec9c
  • Runtime environment: macOS 26.3 (arm64), Python 3.14.0
  • Minimal repro:
    1. Construct malformed checkpoint payload (missing required fields / wrong types).
    2. Route it through checkpoint load path.
    3. Observe behavior under backend implementations.
  • Expected behavior: deterministic schema validation failure before load.
  • Actual behavior: validation strictness is not consistently fail-closed at load boundary.

Why code change (not docs)

Schema enforcement must happen in runtime load codepaths and backend adapters.

Scope / Codepaths

  • libs/checkpoint
  • libs/checkpoint-postgres
  • libs/checkpoint-sqlite

Acceptance Criteria

  • Pre-load schema validation is mandatory and fail-closed.
  • Invalid payloads produce deterministic error class/code.
  • Backend parity tests enforce identical validation semantics.

Validation Plan

  • Add malformed checkpoint fixtures.
  • Add backend-matrix tests for deterministic failure behavior.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions