fix(transpiler): preserve null bytes in tagged template literals#27554
fix(transpiler): preserve null bytes in tagged template literals#27554
Conversation
The UnsignedCodepointIterator used `minInt(u32) = 0` as the error sentinel for detecting invalid multibyte UTF-8 sequences. This collided with the valid null byte codepoint (U+0000), causing null bytes to be misidentified as decode errors and replaced with U+FFFD. The printer then emitted the literal 6-character string `\uFFFD` in raw template literals instead of preserving the null byte. Fix: use `maxInt` instead of `minInt` as the error sentinel. For u32, maxInt is 0xFFFFFFFF which is well beyond the valid Unicode range (max 0x10FFFF) and can never collide with a valid codepoint. Closes #27553 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Updated 8:40 AM PT - Feb 28th, 2026
❌ Your commit
🧪 To try this PR locally: bunx bun-pr 27554That installs a local version of the PR into your bun-27554 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (2)
WalkthroughThe pull request modifies unicode code point error handling and adds regression tests for null byte preservation in template literals. An error sentinel calculation is adjusted in the unicode iterator, alongside comprehensive test coverage for String.raw and template literal behavior. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
LGTM — clean one-line fix with clear root cause and good regression tests.
Extended reasoning...
Overview
This PR fixes a bug where null bytes (U+0000) in tagged template literals were being corrupted by the transpiler. The fix changes one line in src/string/immutable/unicode.zig, switching the error sentinel in NewCodePointIterator.next() from minInt(CodePointType) to maxInt(CodePointType). A new regression test file (test/regression/issue/27553.test.ts) with three test cases is added.
Security risks
No security concerns. The change corrects a data corruption bug by using a proper sentinel value that cannot collide with valid Unicode codepoints.
Level of scrutiny
This is a production-critical code path (the transpiler's Unicode iterator), so it warrants careful review. However, the change is minimal and the logic is clear: minInt(u32) = 0 collided with the null byte codepoint, while maxInt(u32) = 0xFFFFFFFF is well beyond valid Unicode range (max 0x10FFFF). For the signed CodepointIterator (i32), maxInt(i32) is also safely outside the valid range. The fix is correct for both instantiations of the generic.
Other factors
The PR includes three focused regression tests that verify null bytes are preserved in tagged template literals, untagged template literals, and embedded within other content. The PR description mentions existing tests pass. The change is self-contained with no risk of side effects.
Summary
String.rawcorrupting null bytes (U+0000) in tagged template literals by replacing them with the literal string\uFFFDUnsignedCodepointIteratorusedminInt(u32) = 0as the error sentinel for invalid UTF-8 sequences, which collided with the valid null byte codepoint — changed tomaxInt(u32)which is beyond the valid Unicode rangeRoot Cause
In
src/string/immutable/unicode.zig,NewCodePointIterator.next()usedstd.math.minInt(CodePointType)as the error sentinel. For theUnsignedCodepointIterator(u32),minInt(u32) = 0, which collides with the valid null byte codepoint (U+0000). This caused null bytes to be misidentified as decode errors and replaced withunicode_replacement(U+FFFD). The printer then emitted the literal 6-character string\uFFFDin raw template literals.The fix uses
maxIntinstead —maxInt(u32) = 0xFFFFFFFF, which is well beyond the valid Unicode range (max0x10FFFF) and can never collide with a valid codepoint.Test plan
String.rawpreserves null bytes in tagged template literalsString.rawwith surrounding contenttest/bundler/transpiler/template-literal.test.ts)test/bundler/bundler_string.test.ts— 59 tests including NullByte)Closes #27553
🤖 Generated with Claude Code