Skip to content

feat(agent): add a configurable response cache#2536

Merged
dgageot merged 12 commits intodocker:mainfrom
dgageot:board/configurable-response-caching-for-agent-accd7756
Apr 28, 2026
Merged

feat(agent): add a configurable response cache#2536
dgageot merged 12 commits intodocker:mainfrom
dgageot:board/configurable-response-caching-for-agent-accd7756

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 27, 2026

Add a configurable response cache for agents

Adds an opt-in, per-agent response cache that maps a normalized user question to a previously produced assistant response, so asking the exact same question again skips the model entirely and replays the stored answer.

What the user gets

agents:
  root:
    model: openai/gpt-5-mini
    cache:
      enabled: true
      case_sensitive: false   # default: false (case-insensitive matching)
      trim_spaces: true       # default: false
      path: ./cache.json      # optional: persist to disk; omit for in-memory
Knob Effect
enabled Turns the cache on. When false or absent, no caching.
case_sensitive When false (default), "Hello" and "hello" match.
trim_spaces When true, leading/trailing whitespace is stripped before comparison.
path When set, persists entries to JSON on disk. Multiple processes can safely share the same file. When empty, in-memory only.

How sharing works

Scenario Shared? How
Multiple sessions in the same process ✅ Yes Singleton *cache.Cache per agent in the team.
Sessions across process restarts ✅ Yes (with path:) JSON file loaded on New, mirrored on every Store.
Sessions in two simultaneously running processes sharing the same path: ✅ Yes Advisory lock on <path>.lock (flock(2) / LockFileEx) serializes the read-modify-write window; mtime-based reload on Lookup propagates external writes.

Architecture

The cache integrates through the new hook system rather than as bespoke runtime code:

  • Storage half is a cache_response builtin auto-injected as a stop hook when a.Cache() != nil — mirroring the auto-injection pattern used by add_date / add_environment_info / add_prompt_files. The runtime registers the closure once (it captures r to resolve agents via Input.AgentName); applyCacheDefault appends the hook to the per-agent hooks.Config inside buildHooksExecutors.
  • Lookup half stays inline in tryReplayCachedResponse (called once at the start of RunStream) because no current hook event supports short-circuiting the run with a synthetic response. The lookup half is small (~30 lines) and the alternative would require extending the hook output protocol with a SkipWithResponse field that today has no other consumers — happy to revisit when one shows up.

File-backed storage details

  • Writes are atomic: temp file → fsync → rename over the destination, so concurrent readers and crashed writers can never observe a partial JSON file. The parent directory is fsync'd after the rename so the rename itself is durable.
  • Concurrent-write-safe across processes via an advisory lock on a sibling <path>.lock sentinel (POSIX flock(2) on Unix, LockFileEx with LOCKFILE_EXCLUSIVE_LOCK on Windows). The lock file is intentionally never renamed or deleted — doing so would let two processes lock different inodes for the same logical resource and lose mutual exclusion.
  • Lookup checks the file's mtime and reloads the in-memory map when it's advanced since the last load, so cross-process writes become visible without a restart.
  • A path-traversal check rejects any path: whose .. resolution lands outside the agent's config directory. Lexical only — no symlink resolution — which is acceptable because cache paths come from agent configs, not untrusted input.

Files touched

agent-schema.json                  | +30
docs/configuration/agents/index.md | +47
examples/cached_responses.yaml     | +30
pkg/agent/agent.go                 | +8     (Cache() accessor)
pkg/agent/opts.go                  | +8     (WithCache)
pkg/cache/cache.go                 | +359   (Cache struct, Store/Lookup, persist, lock, atomic write)
pkg/cache/cache_test.go            | +375
pkg/cache/lock_unix.go             | +22
pkg/cache/lock_windows.go          | +44
pkg/config/latest/types.go         | +21    (CacheConfig)
pkg/hooks/types.go                 | +11    (AgentName + LastUserMessage on Input)
pkg/runtime/cache.go               | +117   (BuiltinCacheResponse, applyCacheDefault, tryReplayCachedResponse, cacheResponseBuiltin)
pkg/runtime/cache_test.go          | +249
pkg/runtime/hooks.go               | +20    (cache hook injection + Input field plumbing in stop / after_llm_call)
pkg/runtime/loop.go                | +8     (single line cache-replay short-circuit)
pkg/runtime/runtime.go             | +8     (cache_response builtin registration)
pkg/teamloader/cache.go            | +61    (buildAgentCache + path traversal check)
pkg/teamloader/cache_test.go       | +64
pkg/teamloader/teamloader.go       | +8     (calls buildAgentCache)
pkg/teamloader/teamloader_test.go  | +28    (path-traversal test)

Test coverage

  • pkg/cache: in-memory + file-backed paths, both normalization options (case_sensitive, trim_spaces), atomic-write leaves no leftover temp files, dedup-on-rewrite preserves mtime, persistence-failure keeps in-memory state, two cross-process tests (TestFileCache_crossProcessConcurrentStoresPreserveAllEntries simulates two *Cache instances racing 50 keys each on the same path; without the file lock this test loses ~50% of entries), TestFileCache_lookupReloadsAfterExternalWrite, TestFileCache_lockFileNeverDeleted.
  • pkg/runtime: TestCache_StoresAndReplaysAnswer (cache miss → model called once; same question again → cached replay, no model call), TestCache_StorageIsAStopHookBuiltin (pins the architectural contract that storage IS a hook), TestCache_CaseInsensitive, TestCache_TrimSpaces, TestCache_DisabledHasNoEffect, TestCache_EmptyResponseIsNotCached (empty assistant content is dropped on the floor; never replayed).
  • pkg/teamloader: TestBuildAgentCache_* (disabled, in-memory, relative path, absolute path, traversal rejected), TestLoadWithConfig_CachePathTraversal.
  • All tests pass under go test -race.

Validation

  • mise lint: 0 issues
  • mise test: full suite green
  • go test -race ./pkg/cache/... ./pkg/runtime/ ./pkg/teamloader/: clean
  • Cross-platform builds: GOOS=linux/windows GOARCH=amd64 clean

Notes for reviewers

  • 12 commits, kept atomic for review. The arc is roughly: feature → docs/durability → restructure → fixes → hooks integration → simplifications → cross-process safety → review polish.
  • Reviewed twice during development; key prior findings (lost-write race in Cache.Store, fd leak in syncDir, path traversal in resolveCachePath, empty-response handling, persistence-failure resilience, TOCTOU in maybeReload) are all addressed and pinned by tests.
  • The example YAML (examples/cached_responses.yaml) and the schema (agent-schema.json) were validated against each other during development.

Closes #N/A — feature ticket lives outside this repo.

dgageot added 12 commits April 28, 2026 09:19
Add a per-agent response cache that maps a normalized user question to
its previous assistant answer, replaying it verbatim on a hit and
skipping the model entirely.

- New pkg/cache package exposing a Cache interface backed by either an
  in-memory map (default) or a JSON file that is loaded on startup and
  rewritten atomically on every store.
- Two normalization options: case_sensitive (default false) and
  trim_spaces (default false), so "Hello" and "  hello  " can map to
  the same cached answer when configured.
- New CacheConfig in pkg/config/latest with matching agent-schema.json
  definition, plus an examples/cached_responses.yaml demonstrating the
  feature.
- Agent gains a Cache() accessor and a WithCache option; teamloader
  builds the cache from config and resolves a relative path against
  the agent config directory.
- Runtime loop looks up the latest user message at the start of
  RunStream and replays the cached answer (with AgentInfo,
  MessageAdded, and stop hooks) on a hit, then stores the final
  response on the first stop so subsequent identical turns short-
  circuit the model.
- Unit tests for both cache backends and runtime-level tests covering
  hit/miss, case-insensitive matching, whitespace trimming, and the
  no-cache path.

Assisted-By: docker-agent
Add a "Response Cache" section to the agent configuration reference and
strengthen the file-backed cache so writes are not just atomic but also
durable.

- docs/configuration/agents/index.md: extend the schema example and
  property table with the new `cache` field, and add a dedicated
  "Response Cache" section describing the matching options
  (case_sensitive, trim_spaces), the storage backends (in-memory vs
  JSON file), and the atomicity guarantees of the file-backed store.
- pkg/cache: fsync the temp file before rename and fsync the parent
  directory after rename so the rename itself is persisted across an
  OS crash. Concurrent readers are still guaranteed to see either the
  previous or the new content in full, never a torn write.
- pkg/cache: add tests asserting that the success path leaves no
  leftover temp files and that a parallel reader never observes a
  partially written cache file under sustained concurrent stores.

Assisted-By: docker-agent
Same feature, less code. The cache package and the runtime integration
each had visible duplication that made the moving parts harder to
follow than they needed to be.

- pkg/cache: merge `memoryCache` and `fileCache` into a single `cache`
  struct with an optional `persist func(map[string]string)` callback.
  The in-memory backend leaves persist nil; the file-backed backend
  sets it to a closure that writes the snapshot via the existing
  atomic-rename helper. Lookup, Store, and the locking pattern now
  exist in one place instead of two.
- pkg/cache: tidy `writeJSON` (drop the dir != "." special case since
  filepath.Dir always returns a usable directory; pull the directory
  fsync into a small `syncDir` helper) and use `maps.Clone` for the
  snapshot.
- pkg/runtime: move the cache hit/store logic out of the long
  `RunStream` body into two named helpers in a new `cache.go` file:
  `tryReplayCachedResponse` (returns the question + a replayed flag)
  and `cacheTurnResponse` (stores then clears the question pointer).
  The main loop now reads as a two-line cache check and a one-line
  store call.

Net change: -45 lines across the two files. No behavior change; lint,
existing cache unit tests, and runtime cache tests all pass unchanged.

Assisted-By: docker-agent
- Fix race condition in cache.Store: hold lock during persist to ensure
  snapshot consistency with disk writes
- Fix resource leak in syncDir: use defer to ensure directory fd is closed
- Add path traversal validation in teamloader: prevent cache paths from
  escaping parent directory using filepath.Clean and prefix check
- Add test for path traversal protection

Assisted-By: docker-agent
`mise test` deliberately clears API keys to keep the suite
deterministic. Without an OPENAI_API_KEY set, config.CheckRequiredEnvVars
fails before LoadWithConfig ever reaches the cache path validation,
so the test asserted on the wrong error message.

Match the convention used by every other test in this file: pre-set
OPENAI_API_KEY to a dummy value so the env-var check passes and the
path-traversal validation actually runs.

Assisted-By: docker-agent
Now that the hooks system supports builtins and per-event registration,
fold the response cache's storage half into the same surface used by
add_date / add_environment_info / add_prompt_files. The lookup half
stays inline because no hook event currently supports short-circuiting
the run with a synthetic response.

- pkg/hooks: extend Input with AgentName and LastUserMessage. Both are
  generally useful for any builtin/closure that needs per-agent state
  or visibility into the user's latest turn (audit, telemetry, cache).
- pkg/runtime/runtime.go: register a new cache_response builtin as a
  closure in NewLocalRuntime so it can resolve the agent (and therefore
  its cache instance) via Input.AgentName. Auto-inject it as a stop
  hook from getHooksExecutor when a.Cache() != nil, mirroring the way
  AddDate / AddPromptFiles / AddEnvironmentInfo are auto-injected.
  executeStopHooks now populates AgentName and LastUserMessage from
  the session.
- pkg/runtime/cache.go: add the cacheResponseBuiltin method (storage
  half) and trim tryReplayCachedResponse (lookup half) so it returns
  just bool — the caller no longer needs to ferry the cache question
  through the loop because the storage hook reads it from Input at
  dispatch time.
- pkg/runtime/loop.go: drop the cacheQuestion local variable and the
  inline cacheTurnResponse call. The body of res.Stopped is now back
  to its pre-cache shape, and the cache concerns at the top of
  RunStream collapse to a single 4-line lookup-or-replay block.
- The builtin skips Store when the cache already holds the exact same
  (question, response) pair, which makes the replay path (where
  tryReplayCachedResponse fires stop hooks for the cached answer)
  idempotent and prevents a redundant file rewrite on cache hits.

Behavior change: on a multi-turn RunStream (e.g. with a follow-up),
each stop now caches under whatever the latest user message is at that
moment. The previous implementation only cached the very first turn;
this is strictly more useful and matches the natural interpretation of
"same question -> same response" per turn.

Tests: existing cache tests pass unchanged. Adds
TestCache_StorageIsAStopHookBuiltin as a regression test that pins the
new architectural contract — the runtime registers cache_response on
its private hooks registry and auto-injects it for cache-enabled
agents.

Assisted-By: docker-agent
…helper

Three small simplifications, no feature change.

- pkg/cache: collapse the single-implementation Cache interface into an
  exported *Cache struct (Go's "accept interfaces, return structs"
  guideline). New now returns *cache.Cache, callers use a pointer
  field/parameter, and the unexported `cache` type goes away. Net: one
  fewer indirection per call site, ~10 fewer lines.
- pkg/cache: push the "skip rewrite when (q, response) already match"
  optimization down into Cache.Store. The cache itself is the right
  place for it, and any future caller benefits — the runtime
  cache_response builtin no longer needs an extra Lookup before Store.
  Add a unit test (TestFileCache_dedupSkipsRedundantWrite) that
  asserts the file's mtime is preserved on identical writes.
- pkg/runtime/cache.go: as a follow-up to the above, cacheResponseBuiltin
  now collapses to a 3-line happy path (resolve agent, get cache,
  Store) — the dedup branch is gone.
- pkg/teamloader: extract the 24-line inline cache wiring (path
  resolution, traversal check, cache.New) into a dedicated
  buildAgentCache helper in a new cache.go file alongside the loader.
  The agent loop body now reads as 5 lines. Add unit tests for the
  helper covering disabled config, in-memory, relative path,
  absolute path, and the path-traversal rejection.

Validation: mise lint clean, full mise test green (incl. existing
runtime cache tests, new pkg/cache and pkg/teamloader unit tests, and
race detector on pkg/cache/runtime/teamloader).

Assisted-By: docker-agent
Targeted follow-ups from a second-pass review. No behavior change for
existing scenarios; the only functional delta is that
after_llm_call hooks now also receive AgentName / LastUserMessage,
matching what the field's doc comment already promised.

Bug fixes / contract tightening:

- pkg/runtime/runtime.go: populate AgentName and LastUserMessage in
  executeAfterLLMCallHooks, not just executeStopHooks. The Input doc
  comment listed both events; this finishes the implementation.
- pkg/runtime/cache.go: log a Debug line in cacheResponseBuiltin when
  team.Agent(in.AgentName) fails so a misconfigured team is at least
  visible in --debug output instead of silently dropping the write.

Comments to prevent future regressions:

- pkg/cache/cache.go: Store now explicitly documents WHY the persist
  callback runs under the write lock. Releasing the lock between
  cloning the snapshot and writing it would let two concurrent stores
  persist out-of-order and silently overwrite a newer entry with an
  older snapshot on the next restart — exactly the lost-write race
  fixed in 73d806943. The comment makes the trade-off (brief read
  latency vs. strict serialization) the deliberate choice it is.
- pkg/runtime/cache.go: explain why tryReplayCachedResponse treats an
  empty cached value as a miss instead of replaying it.
- pkg/cache/cache.go: document the trim-then-lowercase order in
  keyNormalizer so a future change doesn't accidentally swap them.

Tests:

- pkg/cache/cache_test.go::TestFileCache_persistenceFailureKeepsInMemory
  pins the invariant that a write failure (here: cache directory
  stripped of write permission after New) does not break Store —
  the in-memory map still receives the entry, only the disk mirror
  is skipped. Skipped when running as root.
- pkg/runtime/cache_test.go::TestCache_EmptyResponseIsNotCached pins
  the empty-response semantics: an empty assistant content is dropped
  from the cache and the model is consulted on every repeat ask.

Validation: mise lint clean, full mise test green,
go test -race ./pkg/cache/... ./pkg/runtime/ ./pkg/teamloader/ clean.

Assisted-By: docker-agent
Mirror the upstream `builtins.ApplyAgentDefaults` shape for the
runtime-private cache builtin so that:

- All cache-specific behavior lives in pkg/runtime/cache.go: the
  builtin name constant, the storage closure, the lookup half, AND
  now the auto-inject helper.
- pkg/runtime/hooks.go (an upstream-maintained file) is touched in
  exactly one new line — `cfg = applyCacheDefault(cfg, a)` — making
  future rebases of that file mechanical.

Net change: 11 lines of inline cache injection in hooksExec become a
single helper call, paired with a 12-line `applyCacheDefault` next to
its caller's natural reading partner (cacheResponseBuiltin and
tryReplayCachedResponse). No behavior change; same tests pass.

Validation: mise lint clean, mise test green.

Assisted-By: docker-agent
…cesses

Two processes (or two runtimes) sharing the same cache.json could
silently lose entries: each held its own in-memory map, snapshotted
under its own mutex, and wrote the file independently — last writer
won, the other side's just-stored entries vanished from disk.

Add cross-process serialization via an advisory lock on a sibling
`<path>.lock` file (POSIX flock(2) on Unix, LockFileEx on Windows),
and surface external writes through mtime-based reload on Lookup.

- pkg/cache/lock_unix.go, pkg/cache/lock_windows.go: thin wrappers
  around unix.Flock and windows.LockFileEx (both already in our
  golang.org/x/sys dep tree, no new packages).
- pkg/cache/cache.go: rework Cache.Store from "snapshot under c.mu,
  write" to "lock <path>.lock, reload disk, merge our entry, write
  atomically". The lock window is short — one read, one
  fsync+rename — and serializes the read-modify-write across every
  process holding the same path. Add Cache.maybeReload (called by
  Lookup) to pick up external writes when the file mtime has
  advanced. The persist callback is gone; path/mtime now live on
  the struct.
- The lock file is a long-lived sentinel — never renamed or
  deleted. Removing it would let two processes lock different
  inodes and lose mutual exclusion.
- pkg/cache/cache_test.go: three new tests (and the leftover-files
  test learned about the .lock sentinel):
    * TestFileCache_crossProcessConcurrentStoresPreserveAllEntries
      — two *Cache instances on the same path, 50 distinct keys
      each, asserts all 100 entries are on disk after the race.
      Without the lock this test loses ~50% of entries.
    * TestFileCache_lookupReloadsAfterExternalWrite — one cache
      writes, the other's Lookup picks up the change after the
      mtime advances.
    * TestFileCache_lockFileNeverDeleted — pins the invariant.
- docs/configuration/agents/index.md and examples/cached_responses.yaml
  now describe the cross-process semantics so users know they can
  point multiple agents (or multiple processes) at the same path.

In-memory dedup is preserved as a cheap short-circuit: when our local
view already matches, Store skips the file lock + disk hit entirely.
After a recent Lookup our view is fresh, so this catches the common
"store the answer we just replayed" path of cache_response.

Validation: cross-platform builds (linux/windows amd64) clean,
mise lint clean, mise test green, go test -race ./pkg/cache/...
./pkg/runtime/ ./pkg/teamloader/ clean.

Assisted-By: docker-agent
…ache method

Three small simplifications in pkg/cache/cache.go that trim the file by
~10% and reduce conceptual surface, with no behavior change:

- Replace the fileLock struct + acquireFileLock + (l).release method
  trio with a single `lockFile(path) (func(), error)` that returns a
  release closure. The struct existed only to carry an *os.File and
  expose a method that called Close — exactly what a closure does.
  Saves the type declaration, two separate exports of the same thing,
  and the awkward "lock, err := acquireFileLock(...); defer
  lock.release()" dance.

- Inline the free-function persistEntry into a *Cache method
  persistToDisk that updates c.mtime directly. The old shape had
  persistEntry returning (time.Time, error) just so the caller could
  do `if !mtime.IsZero() { c.mtime = mtime }`; the IsZero gymnastics
  was only there because errors and a missing-stat shared the same
  zero return. Method form lets us simply assign and return on error.

- Extract a tiny `mtimeOf(path) time.Time` helper for the three places
  that want "current mtime, zero if unavailable" (New, persistToDisk's
  dedup branch, persistToDisk's post-write branch). Three identical
  three-line stat-or-zero blocks become one one-liner.

- Move the `c.path != ""` short-circuit into Cache.maybeReload itself
  so Lookup's body no longer needs to know about the file/in-memory
  distinction.

Validation: cross-platform builds (linux/windows amd64) clean, full
mise lint clean, mise test green, go test -race ./pkg/cache/... clean.

Assisted-By: docker-agent
- Fix TOCTOU race condition in Cache.maybeReload by re-stating the file
  under the write lock. The previous implementation stat'd the file before
  acquiring the lock, then used the stale FileInfo after the lock, which
  could lead to storing an outdated mtime that doesn't match the loaded
  content. This could cause subsequent Lookups to miss updates in
  high-concurrency scenarios.

- Clarify Windows lock semantics: the mandatory LockFileEx lock is on the
  separate .lock file (not the data file), so both Unix and Windows achieve
  the same advisory-lock effect for the cache operations.

- Document mtimeOf zero-value semantics: the zero time.Time{} is a sentinel
  for 'file not found' and is safe to compare with time.Time.Equal.

- Add comment explaining Store's update-after-persist ordering: the in-memory
  map is updated after the disk write so that persistence failures still keep
  the entry in memory for the current process.

- Document resolveCachePath's lexical-only path traversal check: symlinks are
  not resolved, which is acceptable for the threat model (cache paths come
  from agent configs, not untrusted user input).

All tests pass with -race. No functional changes to the cache behavior.

Assisted-By: docker-agent
@dgageot dgageot force-pushed the board/configurable-response-caching-for-agent-accd7756 branch from e64407c to c4dc536 Compare April 28, 2026 07:26
@dgageot dgageot merged commit 87233ff into docker:main Apr 28, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants