Skip to content

runtime: extract image-stripping into a registered MessageTransform#2573

Merged
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/extracting-runtime-features-into-builtin-d52e607b
Apr 28, 2026
Merged

runtime: extract image-stripping into a registered MessageTransform#2573
dgageot merged 3 commits intodocker:mainfrom
dgageot:board/extracting-runtime-features-into-builtin-d52e607b

Conversation

@dgageot
Copy link
Copy Markdown
Member

@dgageot dgageot commented Apr 28, 2026

Summary

Extracts the inline stripImageContent call from runStreamLoop into a registered, runtime-private message-transform mechanism that opens the door to a family of message-mutating builtins (PII redactors, secret scrubbers, prompt-prefix injectors, …).

Changes

New mechanism — MessageTransform (in-process before_llm_call rewrites)

  • New MessageTransform type and WithMessageTransform("name", fn) option in pkg/runtime/transforms.go.
  • Transforms are intentionally a runtime-private contract: the cost of JSON-roundtripping a full conversation through the cross-process hook protocol would be prohibitive, so command/model hooks cannot rewrite messages. By design.
  • Transforms run after the standard before_llm_call gate — a hook that wants to abort the call should target the gate, not a transform.
  • Fail-soft: a transform that returns an error logs at warn level and the chain continues with the previous slice. A transform must never break the run loop.
  • Chain order = registration order. Per-agent scoping (if needed) lives in the transform body via hooks.Input.AgentName.

First built-in transform — strip_unsupported_modalities

  • New pkg/runtime/strip_modalities.go hosts BuiltinStripUnsupportedModalities, the transform body, and the stripImageContent helper (moved from streaming.go).
  • The inline if m != nil && len(m.Modalities.Input) > 0 && !slices.Contains(...) block in runStreamLoop is gone. The loop now calls executeBeforeLLMCallHooks (gate) followed by applyBeforeLLMCallTransforms (rewrite) — so a transform failure cannot waste the gate's allow verdict.

Correctness fix — alloy mode + per-tool model override

  • New ModelID field on hooks.Input, populated by runStreamLoop with the model the loop actually picked (post per-tool override, post alloy-mode random selection).
  • The strip transform now keys its modality lookup off in.ModelID instead of calling agent.Model() again — which would re-randomize the alloy pick or miss a per-tool override and consult the wrong modalities.
  • Pinned by TestStripUnsupportedModalitiesTransform_UsesInputModelID, which uses an ID-keyed model store to prove the lookup keys off ModelID rather than the agent.
  • The same ModelID is now also surfaced to user-authored before_llm_call hooks for free.

What's preserved

All previous user-facing behavior:

  • Strip-when-text-only: identical decision logic.
  • "Unknown model → pass through": identical fall-through.
  • The add_date / add_environment_info / add_prompt_files / cache_response builtins are untouched.
  • hooks.Input field additions are backward-compatible (omitempty JSON tags; existing handlers ignore unknown fields).

What's not preserved (intentional)

The original PR briefly experimented with auto-injecting transforms as {type: builtin, command: name} entries into agent hook configs (with a no-op BuiltinFunc shim and dedup logic). This was simplified away because users couldn't actually control transforms through YAML — auto-injection always won — so the YAML coupling was internal plumbing for a control surface that didn't exist. The simplification dropped ~340 net lines without losing any user-facing capability.

Why this matters

The payoff isn't in code we deleted today (the strip is the only candidate currently inline). The payoff is shrinking the diff for future message-rewriting features:

  • PII redactor: ~30-line transform + WithMessageTransform("redact_pii", fn). 0 lines in the run loop.
  • "Drop large tool outputs from old turns": same shape.
  • "Inject team-policy prefix": same shape.

Without this mechanism, each of those would have grown a new branch in runStreamLoop. With it, the loop's pre-LLM-call section stays at three logical lines: get gate verdict, run transforms, call model.

Validation

  • mise lint ✓ (golangci-lint run: 0 issues, internal lint checker: no offenses, go mod tidy --diff: clean)
  • mise test ✓ (full suite passes)
  • New tests cover: text-only / multimodal / unknown-model branches, empty ModelID, registration-order chain semantics, fail-soft contract, end-to-end strip via RunStream, end-to-end transform-error survival, input validation, alloy / per-tool override correctness.

Commits

  1. extract strip_unsupported_modalities into a registered before_llm_call transform
  2. simplify message transforms: drop the YAML auto-injection plumbing
  3. fix strip transform reading wrong model in alloy / per-tool override mode

Assisted-By: docker-agent

dgageot added 3 commits April 28, 2026 11:11
…mode

The transform was calling agent.Model() which re-randomizes alloy picks and ignores per-tool overrides — it could end up consulting modalities for a different model than the one the loop was actually about to call. Pass the resolved modelID through hooks.Input.ModelID instead.
@dgageot dgageot merged commit e59e163 into docker:main Apr 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants