fix(api): sanitize componentState JSON instead of base64 encoding#2359
fix(api): sanitize componentState JSON instead of base64 encoding#2359charliecreates[bot] wants to merge 4 commits intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
There was a problem hiding this comment.
The overall direction is correct (shared sanitizer + readable JSON instead of base64), but a few details are brittle: the “non-empty state” guard can behave unexpectedly for non-plain objects, and several tests assert exact serialized JSON strings (key-order dependent). stringifyJsonForMarkup also has inconsistent error behavior between undefined returns and circular references. Tightening these will make the feature more robust without changing the intended behavior.
Additional notes (1)
- Maintainability |
packages/backend/src/util/thread-to-model-message-conversion.ts:261-265
The wrapper tag name is hard-coded (component_state) in both conversion logic and tests. If any other layer parses/relies on this marker, consider centralizing it as a constant to prevent drift and make future changes safer (e.g., renaming tag, adding attributes, etc.).
Summary of changes
What changed
Backend: include componentState as safe JSON in LLM context
- Updated
packages/backend/src/util/thread-to-model-message-conversion.tsto append an additional text content part:- Wraps state in
<component_state>…</component_state> - Uses shared sanitizer
stringifyJsonForMarkup(...)instead of base64/one-off escaping.
- Wraps state in
Core: add reusable JSON-for-markup stringifier
- Added
stringifyJsonForMarkuptopackages/core/src/json.ts:JSON.stringify+ escapes<,>,&as unicode sequences to remain valid JSON and safe inside markup-like wrappers.
Tests
- Expanded
packages/backend/src/util/thread-to-model-message-conversion.test.tsto cover:- state omitted for
{}andundefined - state included for non-empty objects
- injection attempt can’t break out of wrapper and round-trips via
JSON.parse
- state omitted for
- Added
stringifyJsonForMarkupunit coverage inpackages/core/src/json.test.ts.
| // Include component state so the LLM can see it on follow-up messages | ||
| if ( | ||
| message.componentState && | ||
| Object.keys(message.componentState).length > 0 | ||
| ) { | ||
| const safeJson = stringifyJsonForMarkup(message.componentState); | ||
| content.push({ | ||
| type: "text", | ||
| text: `<component_state>${safeJson}</component_state>`, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Object.keys(message.componentState).length > 0 will treat arrays, Dates, Maps/Sets, and other non-plain objects in surprising ways (e.g., new Date() becomes {} and gets dropped; an array becomes included but might not be intended). If componentState is meant to be a JSON object, this should either enforce “plain object only” or use a more intention-revealing emptiness check.
Right now the condition is also doing work proportional to number of keys; you can avoid Object.keys allocation with an early for…in/Object.values style check, but the bigger issue is semantic correctness/consistency of what qualifies as “empty state”.
Suggestion
Consider tightening the guard to match the domain expectation (likely “plain object with at least one own enumerable property”) and make the intent explicit. For example:
function hasOwnEnumerableKeys(value: unknown): value is Record<string, unknown> {
if (!value || typeof value !== "object" || Array.isArray(value)) return false;
for (const k in value as Record<string, unknown>) {
if (Object.prototype.hasOwnProperty.call(value, k)) return true;
}
return false;
}
// …
if (hasOwnEnumerableKeys(message.componentState)) {
const safeJson = stringifyJsonForMarkup(message.componentState);
content.push({ type: "text", text: `<component_state>${safeJson}</component_state>` });
}If arrays are valid state, adjust accordingly (e.g., allow arrays but check length). Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.
There was a problem hiding this comment.
@CharlieHelps componentState is always Record<string, unknown> - i.e. always an object with user-defined keys. Dates/etc should never show up here (if they do that's real bug)
| // Include component state so the LLM can see it on follow-up messages | ||
| if ( | ||
| message.componentState && | ||
| Object.keys(message.componentState).length > 0 | ||
| ) { | ||
| const safeJson = stringifyJsonForMarkup(message.componentState); | ||
| content.push({ | ||
| type: "text", | ||
| text: `<component_state>${safeJson}</component_state>`, | ||
| }); | ||
| } |
There was a problem hiding this comment.
Appending <component_state>…</component_state> as a plain text content part changes the message semantics and can interfere with downstream logic that expects the last assistant text chunk to be the human-readable response (e.g., truncation, summarization, UI previews, “last message” displays, eval tooling, etc.).
If this is strictly model-context metadata, it should ideally be structured as metadata (if your model API supports it) or clearly separated from user-visible assistant content at call sites. At minimum, consider making the placement configurable or ensuring it’s appended only in the LLM-bound representation and never reused for UI-facing rendering paths.
Suggestion
If possible in your architecture, keep componentState out of the content array used elsewhere and attach it via a dedicated field or an out-of-band “system/context” message specifically for the LLM.
If you must keep it in content, consider:
- prefixing with a sentinel comment (still within the wrapper) and/or
- inserting it as the first part (so the user-facing text remains last), and
- adding a helper for consumers to strip
<component_state>blocks when rendering.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit that moves the injected block to the beginning of content and updates the related tests.
There was a problem hiding this comment.
@CharlieHelps the system message should be explaining/describing this <component_state> tag. We were actually doing this elsewhere in the codebase before we switched to V1 - make sure we're doing the same thing as the pre-v1 code, and follow that pattern (i.e. if the tag is different or we wrap it with something else..)
packages/core/src/json.ts
Outdated
| export function stringifyJsonForMarkup(value: unknown): string { | ||
| const json = JSON.stringify(value); | ||
| if (json === undefined) { | ||
| throw new Error("Value is not JSON-serializable"); | ||
| } | ||
|
|
||
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); |
There was a problem hiding this comment.
stringifyJsonForMarkup currently throws a generic Error("Value is not JSON-serializable") only when JSON.stringify returns undefined. However, the more common non-serializable case is a circular reference, which throws a TypeError directly from JSON.stringify and will bubble up with an inconsistent message.
Given this function is meant as a shared utility, it’s worth normalizing errors so callers get a predictable failure mode (and can log/handle it consistently).
Suggestion
Wrap JSON.stringify in a try/catch and throw a consistent error (optionally preserving the original as cause):
export function stringifyJsonForMarkup(value: unknown): string {
let json: string | undefined;
try {
json = JSON.stringify(value);
} catch (err) {
throw new Error("Value is not JSON-serializable", { cause: err as Error });
}
if (json === undefined) {
throw new Error("Value is not JSON-serializable");
}
return json
.replaceAll("<", "\\u003c")
.replaceAll(">", "\\u003e")
.replaceAll("&", "\\u0026");
}Reply with "@CharlieHelps yes please" if you’d like me to add a commit with this suggestion.
| /** | ||
| * Stringify a value as JSON, but escape characters that can break out of | ||
| * markup-like wrappers (e.g., when embedding JSON inside an XML/HTML tag). | ||
| * | ||
| * This keeps the output valid JSON while preventing `<`, `>`, and `&` from | ||
| * appearing literally. | ||
| * | ||
| * Throws if the value is not JSON-serializable (for example, circular | ||
| * references). | ||
| * | ||
| * @param value - The value to stringify | ||
| * @returns A JSON string safe to embed in markup-like wrappers | ||
| */ | ||
| export function stringifyJsonForMarkup(value: unknown): string { | ||
| const json = JSON.stringify(value); | ||
| if (json === undefined) { | ||
| throw new Error("Value is not JSON-serializable"); | ||
| } | ||
|
|
||
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); | ||
| } |
There was a problem hiding this comment.
The wrapper injection protection here is incomplete for markup-like wrappers. Escaping <, >, and & prevents tag formation and entity injection, but it does not prevent breaking out of contexts where the JSON is embedded inside an attribute or other quoted context (e.g. "), and it also doesn’t cover edge cases like U+2028/U+2029 that can matter if this JSON ever ends up in a <script> context.
Your PR description says “safe for HTML/XML contexts”, but the function name/docs say “markup-like wrappers” (which is broader). Right now the sanitizer is tailored to element text content, not “any markup context”. That mismatch is a correctness/security footgun because future callers may assume stronger guarantees than are actually provided.
Suggestion
Narrow the contract (recommended) or broaden the escaping.
Option A (narrow + safer by design): rename and document explicitly as safe for embedding in element text content (not attributes/scripts).
/**
* Stringify a value as JSON safe to embed in an XML/HTML *element text node*.
* It prevents forming tags/entities by escaping `<`, `>` and `&`.
*
* Not safe for embedding in attributes or script contexts.
*/
export function stringifyJsonForMarkupText(value: unknown): string {
...
}Update callers accordingly.
Option B (broaden): additionally escape \u2028/\u2029 and consider escaping " and ' if this might ever be inserted into quoted contexts.
Reply with "@CharlieHelps yes please" if you'd like me to add a commit implementing Option A (rename + doc updates + callsite/test updates).
packages/core/src/json.ts
Outdated
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); |
There was a problem hiding this comment.
stringifyJsonForMarkup uses replaceAll, which can be a runtime compatibility issue depending on your supported Node/browser matrix. TypeScript can compile this without downleveling; whether it works at runtime depends on the JS engine.
Given this is in packages/core, it’s more likely to be consumed in varied environments than backend-only code.
Suggestion
If you need broader runtime support, replace chained replaceAll with a single regex replace (or a small loop) which works in older engines.
return json.replace(/[<>&]/g, (ch) => {
switch (ch) {
case '<': return '\\u003c';
case '>': return '\\u003e';
case '&': return '\\u0026';
default: return ch;
}
});Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this change.
alecf
left a comment
There was a problem hiding this comment.
@CharlieHelps I'm not sure if some of these already landed on main - can you rebase before applying these changes?
| // Include component state so the LLM can see it on follow-up messages | ||
| if ( | ||
| message.componentState && | ||
| Object.keys(message.componentState).length > 0 | ||
| ) { | ||
| const safeJson = stringifyJsonForMarkup(message.componentState); | ||
| content.push({ | ||
| type: "text", | ||
| text: `<component_state>${safeJson}</component_state>`, | ||
| }); | ||
| } |
There was a problem hiding this comment.
@CharlieHelps componentState is always Record<string, unknown> - i.e. always an object with user-defined keys. Dates/etc should never show up here (if they do that's real bug)
| // Include component state so the LLM can see it on follow-up messages | ||
| if ( | ||
| message.componentState && | ||
| Object.keys(message.componentState).length > 0 | ||
| ) { | ||
| const safeJson = stringifyJsonForMarkup(message.componentState); | ||
| content.push({ | ||
| type: "text", | ||
| text: `<component_state>${safeJson}</component_state>`, | ||
| }); | ||
| } |
There was a problem hiding this comment.
@CharlieHelps the system message should be explaining/describing this <component_state> tag. We were actually doing this elsewhere in the codebase before we switched to V1 - make sure we're doing the same thing as the pre-v1 code, and follow that pattern (i.e. if the tag is different or we wrap it with something else..)
packages/core/src/json.ts
Outdated
| export function stringifyJsonForMarkup(value: unknown): string { | ||
| const json = JSON.stringify(value); | ||
| if (json === undefined) { | ||
| throw new Error("Value is not JSON-serializable"); | ||
| } | ||
|
|
||
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); |
packages/core/src/json.ts
Outdated
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); |
| /** | ||
| * Stringify a value as JSON, but escape characters that can break out of | ||
| * markup-like wrappers (e.g., when embedding JSON inside an XML/HTML tag). | ||
| * | ||
| * This keeps the output valid JSON while preventing `<`, `>`, and `&` from | ||
| * appearing literally. | ||
| * | ||
| * Throws if the value is not JSON-serializable (for example, circular | ||
| * references). | ||
| * | ||
| * @param value - The value to stringify | ||
| * @returns A JSON string safe to embed in markup-like wrappers | ||
| */ | ||
| export function stringifyJsonForMarkup(value: unknown): string { | ||
| const json = JSON.stringify(value); | ||
| if (json === undefined) { | ||
| throw new Error("Value is not JSON-serializable"); | ||
| } | ||
|
|
||
| return json | ||
| .replaceAll("<", "\\u003c") | ||
| .replaceAll(">", "\\u003e") | ||
| .replaceAll("&", "\\u0026"); | ||
| } |
Expand this to see my work.
|
7866e4c to
8585362
Compare
|
Fixes how component state JSON is embedded in LLM context to prevent injection attacks and improve reliability. Changes:
|
There was a problem hiding this comment.
Overall the changes are solid and align with the stated goal (safe JSON in an element text node + less brittle tests). The main concerns are test brittleness from regex-based extraction of <component_state>...</component_state> and a potential runtime compatibility issue from using Error(..., { cause }) while also explicitly optimizing for older runtimes elsewhere. The backend hasOwnKeys helper is acceptable but could be clarified/documented to avoid prototype-chain iteration surprises.
Additional notes (1)
- Maintainability |
packages/backend/src/util/thread-to-model-message-conversion.test.ts:187-198
The regex^<component_state>(.*)</component_state>$is potentially fragile:.*is greedy and (without thesflag) won’t match newlines. While JSON emitted byJSON.stringifyis typically single-line, this is a test and could become flaky if serialization changes (e.g., if pretty-printing is introduced or content includes\n). It also doesn’t assert thatcontent[1].typeis"text", which is part of the structure you’re implicitly relying on.
Summary of changes
Summary of changes
Backend
- Updated the decision-loop system prompt to describe the
'<component_state>...</component_state>'block as system metadata and instruct the model to never echo the tags. - In
convertAssistantMessage(...):- Switched from
stringifyJsonForMarkuptostringifyJsonForMarkupTextwhen serializingmessage.componentState. - Replaced
Object.keys(...).length > 0with a newhasOwnKeys(...)helper to detect non-empty state.
- Switched from
Core
- Introduced
stringifyJsonForMarkupText(...)with a narrower, explicit contract: safe for XML/HTML element text nodes. - Hardened JSON stringification behavior by:
- Normalizing failures from
JSON.stringifyviatry/catch(including circular refs) and throwing a consistent error. - Avoiding
String.prototype.replaceAllin favor of a regex-basedreplacefor runtime compatibility.
- Normalizing failures from
- Kept
stringifyJsonForMarkup(...)as a deprecated alias ofstringifyJsonForMarkupText(...).
Tests
- Made backend tests less brittle by asserting
<component_state>round-trips viaJSON.parseinstead of exact serialized JSON ordering. - Updated core tests to cover:
- Escaping semantics via “does not contain raw
<tag>/&” checks. - Throwing on non-JSON-serializable values (circular reference).
- Escaping semantics via “does not contain raw
| export function stringifyJsonForMarkupText(value: unknown): string { | ||
| let json: string | undefined; | ||
| try { | ||
| json = JSON.stringify(value); | ||
| } catch (error) { | ||
| throw new Error("Value is not JSON-serializable", { | ||
| cause: error, | ||
| }); | ||
| } |
There was a problem hiding this comment.
stringifyJsonForMarkupText now throws new Error(..., { cause }). If this package is consumed in runtimes that don’t support the ErrorOptions cause parameter (older Node/browser), this can be a runtime compatibility problem even though TypeScript compiles.
You already took care to avoid replaceAll for runtime compatibility, so it’s inconsistent to introduce another potentially unsupported feature here.
Suggestion
If you need to support older runtimes, avoid cause and attach the original error in a backwards-compatible way:
} catch (error) {
const err = new Error("Value is not JSON-serializable");
(err as any).cause = error; // optional, best-effort
throw err;
}Or gate on support if you have a runtime feature check. Reply with "@CharlieHelps yes please" if you'd like me to add a commit with this suggestion.
|
#2359 (comment) Rebased this PR onto Key updates are in Changes
Verification# lint
npm run lint
# types
npm run check-types
# tests
npm test |
There was a problem hiding this comment.
The sanitization logic is correct and round-trips cleanly through JSON.parse. The system prompt simplification and "never echo tags" instruction are good improvements. A few actionable items below — mostly around unnecessary complexity.
| /** | ||
| * @deprecated Use stringifyJsonForMarkupText instead. | ||
| * @param value - The value to stringify | ||
| * @returns A JSON string safe to embed in an XML/HTML element text node | ||
| */ | ||
| export function stringifyJsonForMarkup(value: unknown): string { | ||
| return stringifyJsonForMarkupText(value); | ||
| } |
There was a problem hiding this comment.
AGENTS.md explicitly says: "Don't re-export symbols for backwards compatibility. When moving a symbol, update all consumers to import from the new location." A grep confirms zero callers of stringifyJsonForMarkup outside this file — all consumers were already updated in this PR. This deprecated alias is dead code and should be removed per project policy.
| // Intentionally avoid String.prototype.replaceAll for runtime compatibility. | ||
| return json.replace(/[<>&]/g, (character) => { | ||
| switch (character) { | ||
| case "<": | ||
| return "\\u003c"; | ||
| case ">": | ||
| return "\\u003e"; | ||
| case "&": | ||
| return "\\u0026"; | ||
| default: | ||
| return character; | ||
| } | ||
| }); |
There was a problem hiding this comment.
The comment says "Intentionally avoid String.prototype.replaceAll for runtime compatibility" but @tambo-ai-cloud/core is a private server-side package consumed only by apps/api, packages/backend, and packages/db — all running on Node >=22 where replaceAll has been stable since Node 15. The previous .replaceAll() chain was simpler and more readable. Consider reverting to the original approach.
If there's a specific downstream consumer I'm missing that runs in a constrained runtime, a comment calling that out would help.
| function hasOwnKeys(value: Record<string, unknown>): boolean { | ||
| for (const key in value) { | ||
| if (Object.prototype.hasOwnProperty.call(value, key)) { | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| return false; | ||
| } |
There was a problem hiding this comment.
hasOwnKeys with for...in + hasOwnProperty is more complex than the original Object.keys(obj).length > 0 without a meaningful benefit here. componentState is always a plain object from JSON deserialization, so inherited properties aren't a concern and the early-exit performance advantage is negligible for these small objects. The simpler Object.keys(...).length > 0 idiom is easier to read at a glance.

Summary
Tighten up how
componentStateis embedded as JSON inside the<component_state>...</component_state>wrapper for LLM context: clarify the tag semantics in the system prompt, harden the shared sanitizer, and make the tests less brittle.Changes
stringifyJsonForMarkupText()(and keepstringifyJsonForMarkup()as a deprecated alias) to:<,>, and&as unicode sequences for safe embedding in an XML/HTML element text node.replaceAllfor wider runtime compatibility.stringifyJsonForMarkupText()when appending the<component_state>block.hasOwnKeyscheck instead ofObject.keys(...).length.<component_state>...</component_state>tag and that it is metadata (must not be echoed).JSON.parse) over exact string matches (avoids key-order brittleness).Testing
npm run lintnpm run check-typesnpm testpackages/ui-registry,react-sdk, andscripts/are not part of this PR diff.