fix(transpiler): preserve null bytes in tagged template literals by robobun · Pull Request #27554 · oven-sh/bun

robobun · 2026-02-28T15:55:33Z

Summary

Fixed String.raw corrupting null bytes (U+0000) in tagged template literals by replacing them with the literal string \uFFFD
The UnsignedCodepointIterator used minInt(u32) = 0 as the error sentinel for invalid UTF-8 sequences, which collided with the valid null byte codepoint — changed to maxInt(u32) which is beyond the valid Unicode range
Added regression tests for null bytes in tagged template literals, untagged template literals, and embedded null bytes

Root Cause

In src/string/immutable/unicode.zig, NewCodePointIterator.next() used std.math.minInt(CodePointType) as the error sentinel. For the UnsignedCodepointIterator (u32), minInt(u32) = 0, which collides with the valid null byte codepoint (U+0000). This caused null bytes to be misidentified as decode errors and replaced with unicode_replacement (U+FFFD). The printer then emitted the literal 6-character string \uFFFD in raw template literals.

The fix uses maxInt instead — maxInt(u32) = 0xFFFFFFFF, which is well beyond the valid Unicode range (max 0x10FFFF) and can never collide with a valid codepoint.

Test plan

Regression test: String.raw preserves null bytes in tagged template literals
Regression test: null bytes in untagged template literals are preserved
Regression test: null bytes in String.raw with surrounding content
Existing template literal tests pass (test/bundler/transpiler/template-literal.test.ts)
Existing bundler string tests pass (test/bundler/bundler_string.test.ts — 59 tests including NullByte)
Existing raw template literal transpiler tests pass

Closes #27553

🤖 Generated with Claude Code

The UnsignedCodepointIterator used `minInt(u32) = 0` as the error sentinel for detecting invalid multibyte UTF-8 sequences. This collided with the valid null byte codepoint (U+0000), causing null bytes to be misidentified as decode errors and replaced with U+FFFD. The printer then emitted the literal 6-character string `\uFFFD` in raw template literals instead of preserving the null byte. Fix: use `maxInt` instead of `minInt` as the error sentinel. For u32, maxInt is 0xFFFFFFFF which is well beyond the valid Unicode range (max 0x10FFFF) and can never collide with a valid codepoint. Closes #27553 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

robobun · 2026-02-28T15:55:43Z

^{Updated 8:40 AM PT - Feb 28th, 2026}

❌ Your commit 3e36b88f has 4 failures in Build #38349 (All Failures):

test/js/bun/util/v8-heap-snapshot.test.ts - SIGKILL on 🐧 25.04 x64-baseline
test/integration/vite-build/vite-build.test.ts - 1 failing on 🪟 2019 x64
test/bundler/bundler_compile_autoload.test.ts - 1 failing on 🐧 13 x64-asan
test/integration/next-pages/test/dev-server.test.ts - 1 failing on 🍎 14 aarch64

🧪 To try this PR locally:

bunx bun-pr 27554

That installs a local version of the PR into your bun-27554 executable, so you can run:

bun-27554 --bun

coderabbitai · 2026-02-28T15:58:36Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a870e7b and 3e36b88.

📒 Files selected for processing (2)

src/string/immutable/unicode.zig
test/regression/issue/27553.test.ts

Walkthrough

The pull request modifies unicode code point error handling and adds regression tests for null byte preservation in template literals. An error sentinel calculation is adjusted in the unicode iterator, alongside comprehensive test coverage for String.raw and template literal behavior.

Changes

Cohort / File(s)	Summary
Unicode Iterator Error Handling `src/string/immutable/unicode.zig`	Modified error sentinel calculation in NewCodePointIterator.next from `minInt` to `maxInt`, affecting how decoded code points are identified as errors and replaced with unicode_replacement character.
Null Byte Template Literal Tests `test/regression/issue/27553.test.ts`	New regression test suite validating preservation of null bytes (0x00) in template literals with String.raw and untagged templates. Three tests verify correct handling of null bytes in different template literal contexts.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: fixing a transpiler issue where null bytes in tagged template literals were being corrupted, which is directly addressed by the codepoint iterator sentinel value change.
Description check	✅ Passed	The PR description comprehensively covers what was changed (root cause analysis, the fix), how it was verified (test plan with multiple regression tests and existing tests), and includes issue closure reference.
Linked Issues check	✅ Passed	The PR fully addresses issue `#27553` by fixing the null byte corruption in String.raw template literals and adding comprehensive regression tests validating the expected behavior.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the null byte preservation issue: modifying the error sentinel in the codepoint iterator and adding targeted regression tests for the fix.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude

LGTM — clean one-line fix with clear root cause and good regression tests.

Extended reasoning...

Overview

This PR fixes a bug where null bytes (U+0000) in tagged template literals were being corrupted by the transpiler. The fix changes one line in src/string/immutable/unicode.zig, switching the error sentinel in NewCodePointIterator.next() from minInt(CodePointType) to maxInt(CodePointType). A new regression test file (test/regression/issue/27553.test.ts) with three test cases is added.

Security risks

No security concerns. The change corrects a data corruption bug by using a proper sentinel value that cannot collide with valid Unicode codepoints.

Level of scrutiny

This is a production-critical code path (the transpiler's Unicode iterator), so it warrants careful review. However, the change is minimal and the logic is clear: minInt(u32) = 0 collided with the null byte codepoint, while maxInt(u32) = 0xFFFFFFFF is well beyond valid Unicode range (max 0x10FFFF). For the signed CodepointIterator (i32), maxInt(i32) is also safely outside the valid range. The fix is correct for both instantiations of the generic.

Other factors

The PR includes three focused regression tests that verify null bytes are preserved in tagged template literals, untagged template literals, and embedded within other content. The PR description mentions existing tests pass. The change is self-contained with no risk of side effects.

github-actions bot added the claude label Feb 28, 2026

claude bot reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(transpiler): preserve null bytes in tagged template literals#27554

fix(transpiler): preserve null bytes in tagged template literals#27554
robobun wants to merge 1 commit intomainfrom
claude/fix-string-raw-null-byte-27553

robobun commented Feb 28, 2026

robobun commented Feb 28, 2026 •

edited

Loading

coderabbitai bot commented Feb 28, 2026

claude bot left a comment

Labels

1 participant

Conversation