perf: optimize hot paths in apply_writes, RunnableCallable, hash functions, and add_messages by jkennedyvz · Pull Request #6969 · langchain-ai/langgraph

John Kennedy (jkennedyvz) · 2026-02-28T02:44:55Z

Summary

Performance optimizations targeting hot paths identified via cProfile profiling of the benchmark suite. Focuses on eliminating redundant work in the graph execution loop.

Track available channels incrementally in apply_writes — replace O(n) scan of all channels calling is_available() every step with an incrementally maintained _available_channels: set[str] on PregelLoop. For sequential_1000 (~1003 channels, ~1000 steps), this eliminates 1M+ is_available() calls. The set is updated as a side-effect of consume(), update(), and finish() via a local _track() helper.
Cache UntrackedValue isinstance scan — any(isinstance(ch, UntrackedValue) ...) scanned all channels on every put_writes call. Cached as _has_untracked_channels bool once in __enter__/__aenter__.
Cache inspect.signature in RunnableCallable.__init__ — the same functions (e.g., ChannelWrite._write) were inspected thousands of times. Added a module-level _SIGNATURE_CACHE dict keyed by function object with graceful fallback for unhashable callables.
Remove isinstance from hash functions — _xxhash_str and _uuid5_str checked isinstance(p, str) for every part despite all call sites always passing strings. Narrowed type signature from str | bytes to str and removed the check.
Flatten task_path_str — was fully recursive, now iterates elements directly and only recurses for nested tuples (rare case). Saves function call overhead for the common case.
Remove unnecessary typing.cast in add_messages — eliminated ~92K no-op cast() function calls per react_agent_100x run.

Test plan

Full test suite passes (1036 passed, 4 skipped)
make lint clean (ruff check, ruff format, mypy)

🤖 Generated with Claude Code

Add --profile flag to bench/__main__.py that bypasses pyperf and runs each benchmark under cProfile, printing per-benchmark hotspot summaries and writing .prof files for later analysis. Add benchmark-profile and benchmark-profile-spy Makefile targets. Fix O(n^2) performance regression in _get_model_input_state where f-strings eagerly evaluated repr(state) on every call, triggering pydantic __repr__ across all accumulated messages. Move error message construction into the error path so repr is only called when needed. This yields a 3-5x speedup on react_agent_100x benchmarks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `put_writes` method was scanning all channels with `any(isinstance(ch, UntrackedValue) ...)` on every call. For the sequential_1000 benchmark this produced 1M+ isinstance calls through the ABC machinery, consuming 40% of total runtime. Cache the result as `_has_untracked_channels` once in __enter__ and __aenter__, replacing both scan sites (put_writes and checkpoint sanitization). This yields a 1.8-2.3x speedup on sequential_1000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add noqa for intentional E402 imports after --profile guard in bench/__main__.py. Add _has_untracked_channels type annotation to PregelLoop class for mypy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ites Maintain a set of currently-available channel names, updated incrementally as channels change state, so the step-bump loop in apply_writes only iterates available channels instead of scanning all channels with is_available(). For sequential_1000 this reduces function calls by ~54% and improves overall runtime by ~26%. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tten task_path_str Three micro-optimizations targeting disproportionate call counts: 1. Cache inspect.signature results in RunnableCallable.__init__ — avoids repeated signature introspection for the same function (e.g. 1000 ChannelWrite instances all inspecting the same _write/_awrite methods). 2. Remove per-element isinstance check in _xxhash_str/_uuid5_str — all call sites pass string parts, so encode() directly without checking. 3. Flatten task_path_str to avoid recursive calls for the common case of tuple elements being str or int (not nested tuples). Reduces isinstance calls from ~43K to ~38K and total function calls by ~8K for sequential_1000. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The cast(BaseMessageChunk, m) calls in add_messages were no-ops at runtime but accounted for ~92K function calls per react_agent_100x benchmark run (~3ms overhead). Remove them since message_chunk_to_message already handles the type internally. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move --profile flag and Makefile targets (benchmark-profile, benchmark-profile-spy) to a separate patch for a future PR. This PR now contains only runtime performance optimizations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…writes Replace 5 repeated inline blocks that sync available_channels with a local _track() helper that checks is_available() and updates the set, returning the availability bool for callers that also need to update updated_channels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

John Kennedy (jkennedyvz) · 2026-02-28T20:14:24Z

From CI

Benchmark	main	changes
react_agent_100x_sync	719 ms	175 ms: 4.11x faster
react_agent_100x_checkpoint_sync	722 ms	183 ms: 3.96x faster
react_agent_100x	756 ms	214 ms: 3.53x faster
react_agent_100x_checkpoint	764 ms	219 ms: 3.48x faster
sequential_1000_sync	373 ms	155 ms: 2.41x faster
sequential_1000	439 ms	218 ms: 2.01x faster
react_agent_10x_checkpoint_sync	22.4 ms	14.5 ms: 1.54x faster
react_agent_10x_sync	20.7 ms	13.6 ms: 1.52x faster
react_agent_10x_checkpoint	26.6 ms	18.4 ms: 1.44x faster
react_agent_10x	25.2 ms	17.8 ms: 1.42x faster
wide_state_25x300_sync	10.4 ms	9.39 ms: 1.11x faster
pydantic_state_9x1200_checkpoint_sync	61.7 ms	55.7 ms: 1.11x faster
pydantic_state_25x300_sync	18.6 ms	16.8 ms: 1.10x faster
wide_dict_25x300_sync	10.2 ms	9.26 ms: 1.10x faster
pydantic_state_9x1200_checkpoint	67.6 ms	61.4 ms: 1.10x faster
fanout_to_subgraph_10x_checkpoint	34.5 ms	31.4 ms: 1.10x faster
fanout_to_subgraph_10x	32.8 ms	29.9 ms: 1.10x faster
pydantic_state_15x600_checkpoint_sync	71.7 ms	65.7 ms: 1.09x faster
wide_dict_15x600_sync	13.2 ms	12.2 ms: 1.08x faster
pydantic_state_15x600_checkpoint	77.0 ms	71.4 ms: 1.08x faster
pydantic_state_25x300_checkpoint	44.0 ms	40.8 ms: 1.08x faster
pydantic_state_25x300_checkpoint_sync	39.6 ms	37.0 ms: 1.07x faster
pydantic_state_25x300	22.8 ms	21.4 ms: 1.07x faster
fanout_to_subgraph_100x_sync	297 ms	280 ms: 1.06x faster
fanout_to_subgraph_100x_checkpoint_sync	317 ms	301 ms: 1.05x faster

John Kennedy (jkennedyvz) and others added 6 commits February 27, 2026 18:30

fix: lint and type check issues

9f1283d

Add noqa for intentional E402 imports after --profile guard in bench/__main__.py. Add _has_untracked_channels type annotation to PregelLoop class for mypy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

John Kennedy (jkennedyvz) force-pushed the jk/optimizations branch from 29e8041 to cc9ae7c Compare February 28, 2026 06:53

John Kennedy (jkennedyvz) and others added 2 commits February 28, 2026 11:47

chore: remove benchmark profiling mode from PR

bbb0c51

Move --profile flag and Makefile targets (benchmark-profile, benchmark-profile-spy) to a separate patch for a future PR. This PR now contains only runtime performance optimizations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

John Kennedy (jkennedyvz) changed the title ~~perf: fix quadratic bottlenecks and add benchmark profiling mode~~ Feb 28, 2026

John Kennedy (jkennedyvz) requested a review from William FH (hinthornw) February 28, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: optimize hot paths in apply_writes, RunnableCallable, hash functions, and add_messages#6969

perf: optimize hot paths in apply_writes, RunnableCallable, hash functions, and add_messages#6969
John Kennedy (jkennedyvz) wants to merge 8 commits intomainfrom
jk/optimizations

John Kennedy (jkennedyvz) commented Feb 28, 2026 •

edited

Loading

John Kennedy (jkennedyvz) commented Feb 28, 2026 •

edited

Loading

Labels

1 participant

Conversation

John Kennedy (jkennedyvz) commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

John Kennedy (jkennedyvz) commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

1 participant

John Kennedy (jkennedyvz) commented Feb 28, 2026 •

edited

Loading

John Kennedy (jkennedyvz) commented Feb 28, 2026 •

edited

Loading