Skip to content

fix: encode non-ASCII filenames in Content-Disposition headers (RFC 5987)#12918

Open
AntonioABLima wants to merge 9 commits intorelease-1.10.0from
fix/non-ascii-filename-encoding
Open

fix: encode non-ASCII filenames in Content-Disposition headers (RFC 5987)#12918
AntonioABLima wants to merge 9 commits intorelease-1.10.0from
fix/non-ascii-filename-encoding

Conversation

@AntonioABLima
Copy link
Copy Markdown
Member

@AntonioABLima AntonioABLima commented Apr 28, 2026

Closes #8105

Summary

  • Fixes Content-Disposition headers to use RFC 5987 / RFC 6266 dual-param format for file downloads: filename="ascii-fallback"; filename*=UTF-8''<percent-encoded>
  • Applies to single file download (API v1 and v2), batch ZIP download (v2), and flows ZIP download (v1)
  • Updates frontend parsers to prioritize filename*=UTF-8'' over plain filename= when saving downloads
  • Ensures quote() uses safe="" so forward slashes and all special characters are properly percent-encoded in the RFC 5987 value

How to test (end user)

Pre-requisite

Run Langflow locally with make backend + make frontend (or langflow run).

1. Upload and download a single file with a non-ASCII name

  1. Open a flow in the canvas
  2. Add a File component
  3. Click the upload button and upload a file with a non-ASCII name — e.g. rename any file to 龙テスト.txt before uploading
  4. Click the download icon next to the uploaded file
  5. Expected: the browser saves the file as 龙テスト.txt (not ?.txt or a garbled name)

2. Download multiple files as a ZIP (API v2)

  1. Go to File Manager (top menu)
  2. Upload two or more files with accented/CJK names (e.g. résumé.pdf, 测试.txt)
  3. Select all files and click Download
  4. Expected: the ZIP downloads successfully and its name contains a valid timestamp (e.g. 20240428_123456_langflow_files.zip)

3. Download flows as a ZIP

  1. On the home/flows page, select two or more flows (use Shift+click or the checkbox)
  2. Click ExportDownload as ZIP
  3. Expected: the ZIP downloads with a timestamp-based name (e.g. 20240428_123456_langflow_flows.zip) and opens without errors

4. Verify the header directly (optional, for developers)

# Upload a file and grab its download URL, then:
curl -I -H "x-api-key: <your-key>" \
  "http://localhost:7860/api/v1/files/download/<flow-id>/<filename>"
# Expected Content-Disposition:
# attachment; filename="?.txt"; filename*=UTF-8''%E9%BE%99%E3%83%86%E3%82%B9%E3%83%88.txt

Test plan

  • Unit tests added for v1 single file download (Chinese, Japanese, Portuguese, ASCII)
  • Unit tests added for v2 single file download (same character sets)
  • Unit test added for v2 batch ZIP download
  • Unit test added for flows ZIP download with non-ASCII flow names
  • All existing file download tests continue to pass
HTTP headers only support ASCII/latin-1. Placing Chinese or other
non-ASCII characters directly in filename="..." caused a latin-1
serialization error (500) on download.

Fix all four download routes to use an ASCII-safe fallback in
filename= and the URL-encoded form in filename*=UTF-8'':
- api/v1/files: /download/{flow_id}/{file_name}
- api/v1/flows_helpers: /flows/download/ (ZIP)
- api/v2/files: /files/{file_id} and /files/batch/

Also fix the frontend filename extraction regex in both
use-get-download-flows.ts and use-get-download-files.ts to parse
RFC 5987 (filename*=UTF-8''<encoded>) with decodeURIComponent,
falling back to plain filename= for older responses.

Closes #8105
Cover the RFC 5987 Content-Disposition encoding fix across all
affected download endpoints:
- api/v1/files: parametrized test with Chinese, Japanese, accented
  Latin, and ASCII filenames verifying filename*=UTF-8'' encoding
- api/v2/files: same parametrized test for single-file download plus
  a batch download test verifying the ZIP Content-Disposition header
- flows download: test with Chinese flow names verifying the ZIP
  response uses a decodable RFC 5987 filename

Reproduces the regression from issue #8105 where downloading files
with non-ASCII names returned HTTP 500.
…n test assertion

Ensures forward slashes and other special characters are percent-encoded
in Content-Disposition filename* values. Also fixes weak in-operator
assertion in v2 test to check the correct direction.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 28, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2bafa550-9278-40f4-9982-ac976eb50297

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/non-ascii-filename-encoding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug Something isn't working label Apr 28, 2026
@AntonioABLima AntonioABLima changed the base branch from main to release-1.10.0 April 28, 2026 19:46
@github-actions
Copy link
Copy Markdown
Contributor

Migration Validation Passed

All migrations follow the Expand-Contract pattern correctly.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 28, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.74%. Comparing base (9514c9d) to head (fe18cd4).
⚠️ Report is 1 commits behind head on release-1.10.0.

Files with missing lines Patch % Lines
.../queries/file-management/use-get-download-files.ts 0.00% 5 Missing ⚠️
...ollers/API/queries/flows/use-get-download-flows.ts 40.00% 3 Missing ⚠️

❌ Your project check has failed because the head coverage (49.74%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##           release-1.10.0   #12918      +/-   ##
==================================================
- Coverage           53.79%   53.74%   -0.06%     
==================================================
  Files                2047     2048       +1     
  Lines              186342   187069     +727     
  Branches            26535    27766    +1231     
==================================================
+ Hits               100246   100534     +288     
- Misses              84986    85425     +439     
  Partials             1110     1110              
Flag Coverage Δ
backend 55.27% <100.00%> (-1.81%) ⬇️
frontend 54.30% <69.23%> (+0.33%) ⬆️
lfx 49.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/backend/base/langflow/api/utils/core.py 74.78% <100.00%> (-3.15%) ⬇️
src/backend/base/langflow/api/v1/files.py 75.00% <100.00%> (ø)
src/backend/base/langflow/api/v1/flows_helpers.py 69.29% <100.00%> (-1.54%) ⬇️
src/backend/base/langflow/api/v1/projects_files.py 45.53% <ø> (-0.08%) ⬇️
src/backend/base/langflow/api/v2/files.py 65.64% <100.00%> (+0.69%) ⬆️
...nd/src/utils/parse-content-disposition-filename.ts 100.00% <100.00%> (ø)
...ollers/API/queries/flows/use-get-download-flows.ts 26.66% <40.00%> (+1.35%) ⬆️
.../queries/file-management/use-get-download-files.ts 0.00% <0.00%> (ø)

... and 174 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Build successful! ✅
Deploying docs draft.
Deploy successful! View draft

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 36%
36.37% (42063/115626) 67.98% (5807/8542) 36.3% (968/2666)

Unit Test Results

Tests Skipped Failures Errors Time
4102 0 💤 0 ❌ 0 🔥 8m 35s ⏱️
…r injection

Extract build_content_disposition() helper in api/utils/core.py that:
- Escapes double-quotes in the ASCII fallback (prevents parameter smuggling
  via filenames like evil"; x=injected)
- Uses quote(safe="") to percent-encode forward slashes
- Returns the dual-param format (filename= + filename*=) for broad client
  compatibility

Replace the duplicated 3-line encoding blocks in v1/files.py, v2/files.py,
flows_helpers.py, and projects_files.py with a single call to the helper.

projects_files.py was missing from the original fix: it used quote() without
safe="" and emitted only filename*= with no ASCII fallback.
- v2/test_files.py: expected_rfc5987 now includes the file extension so a
  regression that strips the extension from the RFC 5987 value is detected
- test_database.py: strengthen the dual-flow download assertion to verify
  both filename= and filename*= params are present, not just the prefix
- test_database.py: rename test_download_flows_non_ascii_content_disposition
  to test_download_flows_content_disposition_dual_param and update the
  docstring to accurately reflect that the ZIP filename is always ASCII
  (timestamp-based); the test now verifies the dual-param format
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 28, 2026
Copy link
Copy Markdown
Member

@Cristhianzl Cristhianzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧭 Summary

Fixes #8105 (HTTP 500 when downloading files with non-ASCII names). Root cause
is correct: HTTP headers are latin-1 only, and the previous header builders
embedded raw UTF-8 directly into Content-Disposition.

The fix centralizes the encoding into a single helper
build_content_disposition() in api/utils/core.py and updates four
download routes (v1 single file, v1 flows ZIP, v1 projects-files ZIP, v2 single
file, v2 batch ZIP) plus two frontend parsers to consume the new dual-param
format (filename="ascii-fallback"; filename*=UTF-8''<percent-encoded>).

The diagnosis, scope, and DRY consolidation are good. However, the helper has a
header-injection gap (CRLF / control chars are not stripped), the
backslash-escape rule of RFC 6266 §4.1 is incomplete, the frontend parsing logic
is duplicated verbatim across two hooks with 0 % coverage, and the
adversarial test surface is thin (no quote / CRLF / backslash cases).

Verdict: 🟠 Request changes before merge — none of the issues are
exploitable through the current upstream validators, but the helper is
explicitly billed as the "single source of truth" for this header, so it must
be safe in isolation.


📊 Verdict by Category

Category Status Notes
🔴 PII in logs ✅ Pass No new logging; no user identifiers added
🔴 General security 🟠 Concern Helper does not strip CR/LF/control chars from filename (header injection)
🔴 AI/LLM security ✅ N/A No LLM surface touched
🔴 Third-party integration ✅ N/A No external webhook/signature surface
🔴 DRY (backend) ✅ Pass 4 duplicated header strings collapsed into one helper — explicit DRY win
🔴 DRY (frontend) Fail Parser block duplicated byte-for-byte across 2 hooks
🔴 File structure ⚠️ Acceptable core.py already mixes responsibilities (legacy); PR does not worsen materially
🟠 SOLID — SRP ✅ Pass Helper does one thing
🟠 SOLID — others ✅ N/A No inheritance / interface surface introduced
🟠 YAGNI / KISS ✅ Pass Small, targeted fix with no speculative API
🟠 Law of Demeter ✅ Pass No deep chains added
🟠 Error handling ⚠️ Minor Frontend decodeURIComponent can throw on malformed input — no try/catch
🟠 Strong typing ✅ Pass All new signatures fully typed
🟡 Logging ✅ N/A No logging added/removed
🟡 Comments ✅ Pass Helper docstring explains the why (RFC reference + injection note)
🟢 Testing — backend 🟠 Partial Happy paths covered; no adversarial cases (no ", \\, CR/LF, empty, very-long); one assertion still uses in
🟢 Testing — frontend Fail Codecov reports 0 % coverage on the two changed hooks

🔴 CRITICAL Findings (Blockers)

C1 — build_content_disposition() does not sanitize control characters → header-injection vector

Location: src/backend/base/langflow/api/utils/core.py:464-476

def build_content_disposition(filename: str) -> str:
    ascii_fallback = filename.encode("ascii", "replace").decode("ascii").replace('"', '\\"')
    encoded = quote(filename, safe="")
    return f"attachment; filename=\"{ascii_fallback}\"; filename*=UTF-8''{encoded}"

The assumption: the helper assumes non-ASCII is the only problematic class
of character in filename. The docstring even calls out
"prevent header-injection via parameter smuggling" — implying the helper is
the security boundary.

Why it's wrong: encode("ascii", "replace") only replaces bytes ≥ 0x80.
ASCII control chars (\x00\x1F, including \r \n \t) are < 0x80 and
flow through unchanged. The upstream validators are not a safety net here:

  • _get_validated_path_segment (api/utils/core.py:39-46) only rejects ..,
    /, \\not CRLF, not NUL, not other control chars.
  • The v2 download route uses file.name straight from the DB
    (api/v2/files.py:675), bypassing path validation entirely.
  • The v1 projects-files route uses project.name (api/v1/projects_files.py:79)
    — same story.

Concrete payload (works against v2 if a user can store a file row with such
a name):

file.name = 'evil.txt\r\nX-Injected: pwned'
→ Content-Disposition: attachment; filename="evil.txt
X-Injected: pwned"; filename*=UTF-8''evil.txt%0D%0AX-Injected%3A%20pwned

Blast radius: in practice Starlette / h11 / hypercorn refuse to emit
headers containing raw CRLF and the connection 500s — but that is the
behaviour we are fixing in the first place. A future server change, a
non-Starlette deployment, or an HTTP/2 framing edge case turns this into
response-splitting / cache-poisoning / CRLF injection.

Fix (do this in the helper — defense in depth):

import re

_FORBIDDEN_HEADER_CHARS = re.compile(r"[\x00-\x1f\x7f]")

def build_content_disposition(filename: str) -> str:
    safe_filename = _FORBIDDEN_HEADER_CHARS.sub("_", filename)
    ascii_fallback = (
        safe_filename.encode("ascii", "replace")
        .decode("ascii")
        .replace("\\", "\\\\")   # see M1
        .replace('"', '\\"')
    )
    encoded = quote(safe_filename, safe="")
    return f'attachment; filename="{ascii_fallback}"; filename*=UTF-8\'\'{encoded}'

Add a test: test_build_content_disposition_strips_crlf asserting \r\n in
input never appears in output.


🟠 HIGH Findings (Must fix before merge)

H1 — Frontend parser duplicated verbatim across two hooks (DRY)

Locations:

  • src/frontend/src/controllers/API/queries/file-management/use-get-download-files.ts:51-66
  • src/frontend/src/controllers/API/queries/flows/use-get-download-flows.ts:50-65

The 14-line "extract filename from Content-Disposition" block — RFC 5987
regex, fallback regex, decode, trim — is byte-identical in both hooks. The
backend reviewer rule explicitly flags this as a 🔴 CRITICAL DRY violation
(5+ lines × 2+ occurrences = extract).

Fix: create
src/frontend/src/utils/parse-content-disposition-filename.ts:

export function parseContentDispositionFilename(
  header: string | null,
  fallback: string,
): string {
  if (!header) return fallback;
  const rfc5987 = header.match(/filename\*=UTF-8''([^;]+)/i);
  if (rfc5987?.[1]) {
    try {
      return decodeURIComponent(rfc5987[1].trim());
    } catch {
      // fall through to legacy filename=
    }
  }
  const legacy = header.match(/filename="?([^";\n]+)"?/);
  return legacy?.[1]?.trim() ?? fallback;
}

This also addresses M2 (decode failure) in one place.

H2 — Frontend changes have 0 % coverage

Codecov reports:

File Patch coverage
use-get-download-files.ts 0.00 % (13 missing)
use-get-download-flows.ts 7.69 % (12 missing)

The reviewer rule requires ≥ 75 % (target 80 %) on all created tests and
explicit coverage validation for both backend AND frontend. Once the helper
from H1 exists, ship Jest unit tests in
src/frontend/src/utils/__tests__/parse-content-disposition-filename.test.ts
covering:

  • Plain filename="x.txt""x.txt"
  • RFC 5987 priority over legacy when both present
  • Percent-decoded CJK round-trip (e.g. 龙.txt)
  • Missing header → fallback
  • Malformed % sequence → fallback (don't crash)
  • Header with ; inside the encoded value (regex bound check)

H3 — Adversarial test cases missing on the backend helper

The new tests (test_files.py v1 + v2, test_database.py) are entirely
happy-path parametrizations of well-formed CJK / Latin-extended names. The
reviewer rule is explicit: "Both happy path AND adversarial tests exist."

Missing cases — add as parametrized inputs to a direct unit test against
build_content_disposition() (no HTTP round-trip needed):

Input What it proves
'a"b.txt' quote escape produces a parseable header
'a\\b.txt' backslash handled (see M1)
'evil\r\nX: y' CRLF stripped (see C1)
'evil\x00.txt' NUL stripped
'' (empty) helper does not produce malformed filename="" that breaks downstream
'a' * 5000 very long names don't get truncated mid-codepoint
'../etc/passwd' path-segment chars survive percent-encoding intact

🟡 MEDIUM Findings

M1 — RFC 6266 §4.1 quoted-string: backslash is not escaped

api/utils/core.py:474 escapes "\" but not \\\. Per RFC 7230
quoted-pair grammar (referenced from RFC 6266), both must be escaped together,
otherwise pre-existing backslashes corrupt the escape stream:

input:  evil\"x.txt        (7 chars: e v i l \ " x .txt)
current output (filename=): "evil\\"x.txt"
                              ^^^^ ^^      ^ unmatched closing quote
RFC-compliant parse:        "evil\\" → literal "evil\", then x.txt is junk

The fix is .replace("\\", "\\\\").replace('"', '\\"') (note the order — escape
backslashes first, otherwise you double-escape the backslash you just
inserted for the quote). Apply alongside C1.

M2 — Frontend decodeURIComponent is unguarded

use-get-download-files.ts:55 and use-get-download-flows.ts:54 call
decodeURIComponent on the captured group. If a malformed % sequence leaks
through (mismatched server / proxy rewrite / man-in-the-middle), decodeURIComponent
throws URIError and the entire onSuccess handler aborts — the user sees no
download and no error toast.

Wrap in try { ... } catch { /* fall back to legacy filename= */ } (folded
into H1's helper extraction).

M3 — v2 single-download test still uses weak in assertion

tests/unit/api/v2/test_files.py (new test):

assert filename.rsplit(".", 1)[0] in decoded
assert decoded.endswith(".txt")

The commit message for d23d534b says "tighten weak in-operator assertion in
v2 test to check the correct direction". The direction was indeed flipped, but
the test still uses substring containment instead of equality. Because the
storage layer adds a unique prefix, the test should compute the expected
stored name (or read it from upload_response.json()) and assert exact equality:

stored_name = upload_response.json()["name"] + Path(upload_response.json()["path"]).suffix
assert decoded == stored_name

This is the same pattern already used correctly in the v1 test
(assert unquote(rfc5987_value) == stored_name).


🟢 NICE TO HAVE

N1 — core.py is becoming a dumping ground (legacy)

api/utils/core.py is now 476 lines and mixes validate_*, build_*,
has_*, remove_*, check_*, raise_*, plus Annotated type aliases and
an Enum. Per the reviewer rule, validate* and build* must not coexist in
the same file. This is a pre-existing legacy issue; this PR is consistent
with the surrounding style and should not be blocked on it. Worth flagging for
a future cleanup ticket — propose splitting into validators.py,
http_headers.py (where this new helper would live), path_helpers.py, etc.

N2 — Helper return type / test could pin RFC ordering

The helper's contract is "ASCII fallback first, then filename*=". RFC 6266
§4.3 actually recommends filename*= first for clients that prefer it, but
the dual-param order is not strictly defined. Worth a one-line comment
explaining the chosen order (legacy clients that read only the first
filename= see the ASCII fallback; modern clients prefer filename*= regardless).


✅ What this PR does well

  1. Correct root-cause diagnosis. Latin-1 header encoding is the actual
    defect, and the fix targets the encoding boundary, not the symptom.
  2. DRY win on the backend. Four near-identical header strings collapsed
    into one well-documented helper. Found and fixed an uncovered fifth route
    (projects_files.py) along the way — that file emitted only filename*=
    with no ASCII fallback, which would fall back to nothing on legacy clients.
  3. safe="" is correct. The intermediate commit d23d534b correctly
    identified that quote() defaults to safe="/", which would emit raw
    forward slashes inside an RFC 5987 value. Now percent-encoded.
  4. Quote escaping in the ASCII fallback prevents the most obvious
    parameter-smuggling vector (modulo M1 / C1).
  5. Backend coverage report shown (+0.48 % on core.py, +3.51 % on
    v2/files.py) and parametrized tests across CJK / Latin-extended / ASCII.
  6. PR description is excellent — manual test plan, curl reproducer,
    linked issue, all migration / docs / lint CI passing.

📋 Suggested resolution order

  1. C1 — strip control chars in build_content_disposition + add unit test.
  2. M1 — escape backslashes in the same helper (one-line change, one extra test).
  3. H1 — extract the frontend parser to a shared util.
  4. H2 — add Jest tests for the new util.
  5. H3 — add a direct backend unit test for the helper with adversarial inputs (cheaper than HTTP round-trip in test_files.py).
  6. M3 — tighten the v2 download test to equality.
  7. M2 — folded into H1.
  8. N1 / N2 — open follow-up tickets, do not block.

Backend changes 1, 2, 5 are ~20 lines. Frontend changes 3, 4 are ~40 lines
plus a small test file. Total turnaround ≈ 1 hour of work.

…position

Strip ASCII control characters (CR/LF/NUL and others in 0x00-0x1f/0x7f range)
from filename before encoding to prevent header injection. Also escape backslash
before double-quote in the ASCII fallback per RFC 6266 §4.1 quoted-pair grammar.
…hen v2 assertion

Add direct unit tests for the helper covering: quote escaping, backslash escaping,
CRLF/NUL/control char stripping, empty filename, and very long names. Replace
weak substring assertions in the v2 single-file download test with exact equality
against the stored filename returned by the upload response.
Extract the duplicated 14-line RFC 5987 parsing block from use-get-download-files
and use-get-download-flows into a single parseContentDispositionFilename util.
Wraps decodeURIComponent in try/catch to gracefully fall back to the legacy
filename= param on malformed percent-sequences. Adds Jest tests covering all
branches: RFC 5987 priority, legacy fallback, CJK decode, null header, and
malformed input.
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 29, 2026
@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

2 participants