feat(agent): add a configurable response cache#2536
Merged
dgageot merged 12 commits intodocker:mainfrom Apr 28, 2026
Merged
Conversation
Add a per-agent response cache that maps a normalized user question to its previous assistant answer, replaying it verbatim on a hit and skipping the model entirely. - New pkg/cache package exposing a Cache interface backed by either an in-memory map (default) or a JSON file that is loaded on startup and rewritten atomically on every store. - Two normalization options: case_sensitive (default false) and trim_spaces (default false), so "Hello" and " hello " can map to the same cached answer when configured. - New CacheConfig in pkg/config/latest with matching agent-schema.json definition, plus an examples/cached_responses.yaml demonstrating the feature. - Agent gains a Cache() accessor and a WithCache option; teamloader builds the cache from config and resolves a relative path against the agent config directory. - Runtime loop looks up the latest user message at the start of RunStream and replays the cached answer (with AgentInfo, MessageAdded, and stop hooks) on a hit, then stores the final response on the first stop so subsequent identical turns short- circuit the model. - Unit tests for both cache backends and runtime-level tests covering hit/miss, case-insensitive matching, whitespace trimming, and the no-cache path. Assisted-By: docker-agent
Add a "Response Cache" section to the agent configuration reference and strengthen the file-backed cache so writes are not just atomic but also durable. - docs/configuration/agents/index.md: extend the schema example and property table with the new `cache` field, and add a dedicated "Response Cache" section describing the matching options (case_sensitive, trim_spaces), the storage backends (in-memory vs JSON file), and the atomicity guarantees of the file-backed store. - pkg/cache: fsync the temp file before rename and fsync the parent directory after rename so the rename itself is persisted across an OS crash. Concurrent readers are still guaranteed to see either the previous or the new content in full, never a torn write. - pkg/cache: add tests asserting that the success path leaves no leftover temp files and that a parallel reader never observes a partially written cache file under sustained concurrent stores. Assisted-By: docker-agent
Same feature, less code. The cache package and the runtime integration each had visible duplication that made the moving parts harder to follow than they needed to be. - pkg/cache: merge `memoryCache` and `fileCache` into a single `cache` struct with an optional `persist func(map[string]string)` callback. The in-memory backend leaves persist nil; the file-backed backend sets it to a closure that writes the snapshot via the existing atomic-rename helper. Lookup, Store, and the locking pattern now exist in one place instead of two. - pkg/cache: tidy `writeJSON` (drop the dir != "." special case since filepath.Dir always returns a usable directory; pull the directory fsync into a small `syncDir` helper) and use `maps.Clone` for the snapshot. - pkg/runtime: move the cache hit/store logic out of the long `RunStream` body into two named helpers in a new `cache.go` file: `tryReplayCachedResponse` (returns the question + a replayed flag) and `cacheTurnResponse` (stores then clears the question pointer). The main loop now reads as a two-line cache check and a one-line store call. Net change: -45 lines across the two files. No behavior change; lint, existing cache unit tests, and runtime cache tests all pass unchanged. Assisted-By: docker-agent
- Fix race condition in cache.Store: hold lock during persist to ensure snapshot consistency with disk writes - Fix resource leak in syncDir: use defer to ensure directory fd is closed - Add path traversal validation in teamloader: prevent cache paths from escaping parent directory using filepath.Clean and prefix check - Add test for path traversal protection Assisted-By: docker-agent
`mise test` deliberately clears API keys to keep the suite deterministic. Without an OPENAI_API_KEY set, config.CheckRequiredEnvVars fails before LoadWithConfig ever reaches the cache path validation, so the test asserted on the wrong error message. Match the convention used by every other test in this file: pre-set OPENAI_API_KEY to a dummy value so the env-var check passes and the path-traversal validation actually runs. Assisted-By: docker-agent
Now that the hooks system supports builtins and per-event registration, fold the response cache's storage half into the same surface used by add_date / add_environment_info / add_prompt_files. The lookup half stays inline because no hook event currently supports short-circuiting the run with a synthetic response. - pkg/hooks: extend Input with AgentName and LastUserMessage. Both are generally useful for any builtin/closure that needs per-agent state or visibility into the user's latest turn (audit, telemetry, cache). - pkg/runtime/runtime.go: register a new cache_response builtin as a closure in NewLocalRuntime so it can resolve the agent (and therefore its cache instance) via Input.AgentName. Auto-inject it as a stop hook from getHooksExecutor when a.Cache() != nil, mirroring the way AddDate / AddPromptFiles / AddEnvironmentInfo are auto-injected. executeStopHooks now populates AgentName and LastUserMessage from the session. - pkg/runtime/cache.go: add the cacheResponseBuiltin method (storage half) and trim tryReplayCachedResponse (lookup half) so it returns just bool — the caller no longer needs to ferry the cache question through the loop because the storage hook reads it from Input at dispatch time. - pkg/runtime/loop.go: drop the cacheQuestion local variable and the inline cacheTurnResponse call. The body of res.Stopped is now back to its pre-cache shape, and the cache concerns at the top of RunStream collapse to a single 4-line lookup-or-replay block. - The builtin skips Store when the cache already holds the exact same (question, response) pair, which makes the replay path (where tryReplayCachedResponse fires stop hooks for the cached answer) idempotent and prevents a redundant file rewrite on cache hits. Behavior change: on a multi-turn RunStream (e.g. with a follow-up), each stop now caches under whatever the latest user message is at that moment. The previous implementation only cached the very first turn; this is strictly more useful and matches the natural interpretation of "same question -> same response" per turn. Tests: existing cache tests pass unchanged. Adds TestCache_StorageIsAStopHookBuiltin as a regression test that pins the new architectural contract — the runtime registers cache_response on its private hooks registry and auto-injects it for cache-enabled agents. Assisted-By: docker-agent
…helper Three small simplifications, no feature change. - pkg/cache: collapse the single-implementation Cache interface into an exported *Cache struct (Go's "accept interfaces, return structs" guideline). New now returns *cache.Cache, callers use a pointer field/parameter, and the unexported `cache` type goes away. Net: one fewer indirection per call site, ~10 fewer lines. - pkg/cache: push the "skip rewrite when (q, response) already match" optimization down into Cache.Store. The cache itself is the right place for it, and any future caller benefits — the runtime cache_response builtin no longer needs an extra Lookup before Store. Add a unit test (TestFileCache_dedupSkipsRedundantWrite) that asserts the file's mtime is preserved on identical writes. - pkg/runtime/cache.go: as a follow-up to the above, cacheResponseBuiltin now collapses to a 3-line happy path (resolve agent, get cache, Store) — the dedup branch is gone. - pkg/teamloader: extract the 24-line inline cache wiring (path resolution, traversal check, cache.New) into a dedicated buildAgentCache helper in a new cache.go file alongside the loader. The agent loop body now reads as 5 lines. Add unit tests for the helper covering disabled config, in-memory, relative path, absolute path, and the path-traversal rejection. Validation: mise lint clean, full mise test green (incl. existing runtime cache tests, new pkg/cache and pkg/teamloader unit tests, and race detector on pkg/cache/runtime/teamloader). Assisted-By: docker-agent
Targeted follow-ups from a second-pass review. No behavior change for existing scenarios; the only functional delta is that after_llm_call hooks now also receive AgentName / LastUserMessage, matching what the field's doc comment already promised. Bug fixes / contract tightening: - pkg/runtime/runtime.go: populate AgentName and LastUserMessage in executeAfterLLMCallHooks, not just executeStopHooks. The Input doc comment listed both events; this finishes the implementation. - pkg/runtime/cache.go: log a Debug line in cacheResponseBuiltin when team.Agent(in.AgentName) fails so a misconfigured team is at least visible in --debug output instead of silently dropping the write. Comments to prevent future regressions: - pkg/cache/cache.go: Store now explicitly documents WHY the persist callback runs under the write lock. Releasing the lock between cloning the snapshot and writing it would let two concurrent stores persist out-of-order and silently overwrite a newer entry with an older snapshot on the next restart — exactly the lost-write race fixed in 73d806943. The comment makes the trade-off (brief read latency vs. strict serialization) the deliberate choice it is. - pkg/runtime/cache.go: explain why tryReplayCachedResponse treats an empty cached value as a miss instead of replaying it. - pkg/cache/cache.go: document the trim-then-lowercase order in keyNormalizer so a future change doesn't accidentally swap them. Tests: - pkg/cache/cache_test.go::TestFileCache_persistenceFailureKeepsInMemory pins the invariant that a write failure (here: cache directory stripped of write permission after New) does not break Store — the in-memory map still receives the entry, only the disk mirror is skipped. Skipped when running as root. - pkg/runtime/cache_test.go::TestCache_EmptyResponseIsNotCached pins the empty-response semantics: an empty assistant content is dropped from the cache and the model is consulted on every repeat ask. Validation: mise lint clean, full mise test green, go test -race ./pkg/cache/... ./pkg/runtime/ ./pkg/teamloader/ clean. Assisted-By: docker-agent
Mirror the upstream `builtins.ApplyAgentDefaults` shape for the runtime-private cache builtin so that: - All cache-specific behavior lives in pkg/runtime/cache.go: the builtin name constant, the storage closure, the lookup half, AND now the auto-inject helper. - pkg/runtime/hooks.go (an upstream-maintained file) is touched in exactly one new line — `cfg = applyCacheDefault(cfg, a)` — making future rebases of that file mechanical. Net change: 11 lines of inline cache injection in hooksExec become a single helper call, paired with a 12-line `applyCacheDefault` next to its caller's natural reading partner (cacheResponseBuiltin and tryReplayCachedResponse). No behavior change; same tests pass. Validation: mise lint clean, mise test green. Assisted-By: docker-agent
…cesses
Two processes (or two runtimes) sharing the same cache.json could
silently lose entries: each held its own in-memory map, snapshotted
under its own mutex, and wrote the file independently — last writer
won, the other side's just-stored entries vanished from disk.
Add cross-process serialization via an advisory lock on a sibling
`<path>.lock` file (POSIX flock(2) on Unix, LockFileEx on Windows),
and surface external writes through mtime-based reload on Lookup.
- pkg/cache/lock_unix.go, pkg/cache/lock_windows.go: thin wrappers
around unix.Flock and windows.LockFileEx (both already in our
golang.org/x/sys dep tree, no new packages).
- pkg/cache/cache.go: rework Cache.Store from "snapshot under c.mu,
write" to "lock <path>.lock, reload disk, merge our entry, write
atomically". The lock window is short — one read, one
fsync+rename — and serializes the read-modify-write across every
process holding the same path. Add Cache.maybeReload (called by
Lookup) to pick up external writes when the file mtime has
advanced. The persist callback is gone; path/mtime now live on
the struct.
- The lock file is a long-lived sentinel — never renamed or
deleted. Removing it would let two processes lock different
inodes and lose mutual exclusion.
- pkg/cache/cache_test.go: three new tests (and the leftover-files
test learned about the .lock sentinel):
* TestFileCache_crossProcessConcurrentStoresPreserveAllEntries
— two *Cache instances on the same path, 50 distinct keys
each, asserts all 100 entries are on disk after the race.
Without the lock this test loses ~50% of entries.
* TestFileCache_lookupReloadsAfterExternalWrite — one cache
writes, the other's Lookup picks up the change after the
mtime advances.
* TestFileCache_lockFileNeverDeleted — pins the invariant.
- docs/configuration/agents/index.md and examples/cached_responses.yaml
now describe the cross-process semantics so users know they can
point multiple agents (or multiple processes) at the same path.
In-memory dedup is preserved as a cheap short-circuit: when our local
view already matches, Store skips the file lock + disk hit entirely.
After a recent Lookup our view is fresh, so this catches the common
"store the answer we just replayed" path of cache_response.
Validation: cross-platform builds (linux/windows amd64) clean,
mise lint clean, mise test green, go test -race ./pkg/cache/...
./pkg/runtime/ ./pkg/teamloader/ clean.
Assisted-By: docker-agent
…ache method
Three small simplifications in pkg/cache/cache.go that trim the file by
~10% and reduce conceptual surface, with no behavior change:
- Replace the fileLock struct + acquireFileLock + (l).release method
trio with a single `lockFile(path) (func(), error)` that returns a
release closure. The struct existed only to carry an *os.File and
expose a method that called Close — exactly what a closure does.
Saves the type declaration, two separate exports of the same thing,
and the awkward "lock, err := acquireFileLock(...); defer
lock.release()" dance.
- Inline the free-function persistEntry into a *Cache method
persistToDisk that updates c.mtime directly. The old shape had
persistEntry returning (time.Time, error) just so the caller could
do `if !mtime.IsZero() { c.mtime = mtime }`; the IsZero gymnastics
was only there because errors and a missing-stat shared the same
zero return. Method form lets us simply assign and return on error.
- Extract a tiny `mtimeOf(path) time.Time` helper for the three places
that want "current mtime, zero if unavailable" (New, persistToDisk's
dedup branch, persistToDisk's post-write branch). Three identical
three-line stat-or-zero blocks become one one-liner.
- Move the `c.path != ""` short-circuit into Cache.maybeReload itself
so Lookup's body no longer needs to know about the file/in-memory
distinction.
Validation: cross-platform builds (linux/windows amd64) clean, full
mise lint clean, mise test green, go test -race ./pkg/cache/... clean.
Assisted-By: docker-agent
- Fix TOCTOU race condition in Cache.maybeReload by re-stating the file
under the write lock. The previous implementation stat'd the file before
acquiring the lock, then used the stale FileInfo after the lock, which
could lead to storing an outdated mtime that doesn't match the loaded
content. This could cause subsequent Lookups to miss updates in
high-concurrency scenarios.
- Clarify Windows lock semantics: the mandatory LockFileEx lock is on the
separate .lock file (not the data file), so both Unix and Windows achieve
the same advisory-lock effect for the cache operations.
- Document mtimeOf zero-value semantics: the zero time.Time{} is a sentinel
for 'file not found' and is safe to compare with time.Time.Equal.
- Add comment explaining Store's update-after-persist ordering: the in-memory
map is updated after the disk write so that persistence failures still keep
the entry in memory for the current process.
- Document resolveCachePath's lexical-only path traversal check: symlinks are
not resolved, which is acceptable for the threat model (cache paths come
from agent configs, not untrusted user input).
All tests pass with -race. No functional changes to the cache behavior.
Assisted-By: docker-agent
e64407c to
c4dc536
Compare
trungutt
approved these changes
Apr 28, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a configurable response cache for agents
Adds an opt-in, per-agent response cache that maps a normalized user question to a previously produced assistant response, so asking the exact same question again skips the model entirely and replays the stored answer.
What the user gets
enabledfalseor absent, no caching.case_sensitivefalse(default),"Hello"and"hello"match.trim_spacestrue, leading/trailing whitespace is stripped before comparison.pathHow sharing works
*cache.Cacheper agent in the team.path:)New, mirrored on everyStore.path:<path>.lock(flock(2)/LockFileEx) serializes the read-modify-write window; mtime-based reload onLookuppropagates external writes.Architecture
The cache integrates through the new hook system rather than as bespoke runtime code:
cache_responsebuiltin auto-injected as astophook whena.Cache() != nil— mirroring the auto-injection pattern used byadd_date/add_environment_info/add_prompt_files. The runtime registers the closure once (it capturesrto resolve agents viaInput.AgentName);applyCacheDefaultappends the hook to the per-agenthooks.ConfiginsidebuildHooksExecutors.tryReplayCachedResponse(called once at the start ofRunStream) because no current hook event supports short-circuiting the run with a synthetic response. The lookup half is small (~30 lines) and the alternative would require extending the hook output protocol with aSkipWithResponsefield that today has no other consumers — happy to revisit when one shows up.File-backed storage details
<path>.locksentinel (POSIXflock(2)on Unix,LockFileExwithLOCKFILE_EXCLUSIVE_LOCKon Windows). The lock file is intentionally never renamed or deleted — doing so would let two processes lock different inodes for the same logical resource and lose mutual exclusion.Lookupchecks the file's mtime and reloads the in-memory map when it's advanced since the last load, so cross-process writes become visible without a restart.path:whose..resolution lands outside the agent's config directory. Lexical only — no symlink resolution — which is acceptable because cache paths come from agent configs, not untrusted input.Files touched
Test coverage
pkg/cache: in-memory + file-backed paths, both normalization options (case_sensitive,trim_spaces), atomic-write leaves no leftover temp files, dedup-on-rewrite preserves mtime, persistence-failure keeps in-memory state, two cross-process tests (TestFileCache_crossProcessConcurrentStoresPreserveAllEntriessimulates two*Cacheinstances racing 50 keys each on the same path; without the file lock this test loses ~50% of entries),TestFileCache_lookupReloadsAfterExternalWrite,TestFileCache_lockFileNeverDeleted.pkg/runtime:TestCache_StoresAndReplaysAnswer(cache miss → model called once; same question again → cached replay, no model call),TestCache_StorageIsAStopHookBuiltin(pins the architectural contract that storage IS a hook),TestCache_CaseInsensitive,TestCache_TrimSpaces,TestCache_DisabledHasNoEffect,TestCache_EmptyResponseIsNotCached(empty assistant content is dropped on the floor; never replayed).pkg/teamloader:TestBuildAgentCache_*(disabled, in-memory, relative path, absolute path, traversal rejected),TestLoadWithConfig_CachePathTraversal.go test -race.Validation
mise lint: 0 issuesmise test: full suite greengo test -race ./pkg/cache/... ./pkg/runtime/ ./pkg/teamloader/: cleanGOOS=linux/windows GOARCH=amd64cleanNotes for reviewers
Cache.Store, fd leak insyncDir, path traversal inresolveCachePath, empty-response handling, persistence-failure resilience, TOCTOU inmaybeReload) are all addressed and pinned by tests.examples/cached_responses.yaml) and the schema (agent-schema.json) were validated against each other during development.Closes #N/A — feature ticket lives outside this repo.