Skip to content

fix(transpiler): preserve null bytes in tagged template literals#27554

Open
robobun wants to merge 1 commit intomainfrom
claude/fix-string-raw-null-byte-27553
Open

fix(transpiler): preserve null bytes in tagged template literals#27554
robobun wants to merge 1 commit intomainfrom
claude/fix-string-raw-null-byte-27553

Conversation

@robobun
Copy link
Collaborator

@robobun robobun commented Feb 28, 2026

Summary

  • Fixed String.raw corrupting null bytes (U+0000) in tagged template literals by replacing them with the literal string \uFFFD
  • The UnsignedCodepointIterator used minInt(u32) = 0 as the error sentinel for invalid UTF-8 sequences, which collided with the valid null byte codepoint — changed to maxInt(u32) which is beyond the valid Unicode range
  • Added regression tests for null bytes in tagged template literals, untagged template literals, and embedded null bytes

Root Cause

In src/string/immutable/unicode.zig, NewCodePointIterator.next() used std.math.minInt(CodePointType) as the error sentinel. For the UnsignedCodepointIterator (u32), minInt(u32) = 0, which collides with the valid null byte codepoint (U+0000). This caused null bytes to be misidentified as decode errors and replaced with unicode_replacement (U+FFFD). The printer then emitted the literal 6-character string \uFFFD in raw template literals.

The fix uses maxInt instead — maxInt(u32) = 0xFFFFFFFF, which is well beyond the valid Unicode range (max 0x10FFFF) and can never collide with a valid codepoint.

Test plan

  • Regression test: String.raw preserves null bytes in tagged template literals
  • Regression test: null bytes in untagged template literals are preserved
  • Regression test: null bytes in String.raw with surrounding content
  • Existing template literal tests pass (test/bundler/transpiler/template-literal.test.ts)
  • Existing bundler string tests pass (test/bundler/bundler_string.test.ts — 59 tests including NullByte)
  • Existing raw template literal transpiler tests pass

Closes #27553

🤖 Generated with Claude Code

The UnsignedCodepointIterator used `minInt(u32) = 0` as the error
sentinel for detecting invalid multibyte UTF-8 sequences. This collided
with the valid null byte codepoint (U+0000), causing null bytes to be
misidentified as decode errors and replaced with U+FFFD. The printer
then emitted the literal 6-character string `\uFFFD` in raw template
literals instead of preserving the null byte.

Fix: use `maxInt` instead of `minInt` as the error sentinel. For u32,
maxInt is 0xFFFFFFFF which is well beyond the valid Unicode range
(max 0x10FFFF) and can never collide with a valid codepoint.

Closes #27553

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@robobun
Copy link
Collaborator Author

robobun commented Feb 28, 2026

Updated 8:40 AM PT - Feb 28th, 2026

❌ Your commit 3e36b88f has 4 failures in Build #38349 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 27554

That installs a local version of the PR into your bun-27554 executable, so you can run:

bun-27554 --bun
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a870e7b and 3e36b88.

📒 Files selected for processing (2)
  • src/string/immutable/unicode.zig
  • test/regression/issue/27553.test.ts

Walkthrough

The pull request modifies unicode code point error handling and adds regression tests for null byte preservation in template literals. An error sentinel calculation is adjusted in the unicode iterator, alongside comprehensive test coverage for String.raw and template literal behavior.

Changes

Cohort / File(s) Summary
Unicode Iterator Error Handling
src/string/immutable/unicode.zig
Modified error sentinel calculation in NewCodePointIterator.next from minInt to maxInt, affecting how decoded code points are identified as errors and replaced with unicode_replacement character.
Null Byte Template Literal Tests
test/regression/issue/27553.test.ts
New regression test suite validating preservation of null bytes (0x00) in template literals with String.raw and untagged templates. Three tests verify correct handling of null bytes in different template literal contexts.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing a transpiler issue where null bytes in tagged template literals were being corrupted, which is directly addressed by the codepoint iterator sentinel value change.
Description check ✅ Passed The PR description comprehensively covers what was changed (root cause analysis, the fix), how it was verified (test plan with multiple regression tests and existing tests), and includes issue closure reference.
Linked Issues check ✅ Passed The PR fully addresses issue #27553 by fixing the null byte corruption in String.raw template literals and adding comprehensive regression tests validating the expected behavior.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the null byte preservation issue: modifying the error sentinel in the codepoint iterator and adding targeted regression tests for the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean one-line fix with clear root cause and good regression tests.

Extended reasoning...

Overview

This PR fixes a bug where null bytes (U+0000) in tagged template literals were being corrupted by the transpiler. The fix changes one line in src/string/immutable/unicode.zig, switching the error sentinel in NewCodePointIterator.next() from minInt(CodePointType) to maxInt(CodePointType). A new regression test file (test/regression/issue/27553.test.ts) with three test cases is added.

Security risks

No security concerns. The change corrects a data corruption bug by using a proper sentinel value that cannot collide with valid Unicode codepoints.

Level of scrutiny

This is a production-critical code path (the transpiler's Unicode iterator), so it warrants careful review. However, the change is minimal and the logic is clear: minInt(u32) = 0 collided with the null byte codepoint, while maxInt(u32) = 0xFFFFFFFF is well beyond valid Unicode range (max 0x10FFFF). For the signed CodepointIterator (i32), maxInt(i32) is also safely outside the valid range. The fix is correct for both instantiations of the generic.

Other factors

The PR includes three focused regression tests that verify null bytes are preserved in tagged template literals, untagged template literals, and embedded within other content. The PR description mentions existing tests pass. The change is self-contained with no risk of side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1 participant