fix: encode non-ASCII filenames in Content-Disposition headers (RFC 5987)#12918
fix: encode non-ASCII filenames in Content-Disposition headers (RFC 5987)#12918AntonioABLima wants to merge 9 commits intorelease-1.10.0from
Conversation
HTTP headers only support ASCII/latin-1. Placing Chinese or other
non-ASCII characters directly in filename="..." caused a latin-1
serialization error (500) on download.
Fix all four download routes to use an ASCII-safe fallback in
filename= and the URL-encoded form in filename*=UTF-8'':
- api/v1/files: /download/{flow_id}/{file_name}
- api/v1/flows_helpers: /flows/download/ (ZIP)
- api/v2/files: /files/{file_id} and /files/batch/
Also fix the frontend filename extraction regex in both
use-get-download-flows.ts and use-get-download-files.ts to parse
RFC 5987 (filename*=UTF-8''<encoded>) with decodeURIComponent,
falling back to plain filename= for older responses.
Closes #8105
Cover the RFC 5987 Content-Disposition encoding fix across all affected download endpoints: - api/v1/files: parametrized test with Chinese, Japanese, accented Latin, and ASCII filenames verifying filename*=UTF-8'' encoding - api/v2/files: same parametrized test for single-file download plus a batch download test verifying the ZIP Content-Disposition header - flows download: test with Chinese flow names verifying the ZIP response uses a decodable RFC 5987 filename Reproduces the regression from issue #8105 where downloading files with non-ASCII names returned HTTP 500.
…n test assertion Ensures forward slashes and other special characters are percent-encoded in Content-Disposition filename* values. Also fixes weak in-operator assertion in v2 test to check the correct direction.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
✅ Migration Validation Passed All migrations follow the Expand-Contract pattern correctly. |
Codecov Report❌ Patch coverage is
❌ Your project check has failed because the head coverage (49.74%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## release-1.10.0 #12918 +/- ##
==================================================
- Coverage 53.79% 53.74% -0.06%
==================================================
Files 2047 2048 +1
Lines 186342 187069 +727
Branches 26535 27766 +1231
==================================================
+ Hits 100246 100534 +288
- Misses 84986 85425 +439
Partials 1110 1110
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Build successful! ✅ |
…r injection Extract build_content_disposition() helper in api/utils/core.py that: - Escapes double-quotes in the ASCII fallback (prevents parameter smuggling via filenames like evil"; x=injected) - Uses quote(safe="") to percent-encode forward slashes - Returns the dual-param format (filename= + filename*=) for broad client compatibility Replace the duplicated 3-line encoding blocks in v1/files.py, v2/files.py, flows_helpers.py, and projects_files.py with a single call to the helper. projects_files.py was missing from the original fix: it used quote() without safe="" and emitted only filename*= with no ASCII fallback.
- v2/test_files.py: expected_rfc5987 now includes the file extension so a regression that strips the extension from the RFC 5987 value is detected - test_database.py: strengthen the dual-flow download assertion to verify both filename= and filename*= params are present, not just the prefix - test_database.py: rename test_download_flows_non_ascii_content_disposition to test_download_flows_content_disposition_dual_param and update the docstring to accurately reflect that the ZIP filename is always ASCII (timestamp-based); the test now verifies the dual-param format
Cristhianzl
left a comment
There was a problem hiding this comment.
🧭 Summary
Fixes #8105 (HTTP 500 when downloading files with non-ASCII names). Root cause
is correct: HTTP headers are latin-1 only, and the previous header builders
embedded raw UTF-8 directly into Content-Disposition.
The fix centralizes the encoding into a single helper
build_content_disposition() in api/utils/core.py and updates four
download routes (v1 single file, v1 flows ZIP, v1 projects-files ZIP, v2 single
file, v2 batch ZIP) plus two frontend parsers to consume the new dual-param
format (filename="ascii-fallback"; filename*=UTF-8''<percent-encoded>).
The diagnosis, scope, and DRY consolidation are good. However, the helper has a
header-injection gap (CRLF / control chars are not stripped), the
backslash-escape rule of RFC 6266 §4.1 is incomplete, the frontend parsing logic
is duplicated verbatim across two hooks with 0 % coverage, and the
adversarial test surface is thin (no quote / CRLF / backslash cases).
Verdict: 🟠 Request changes before merge — none of the issues are
exploitable through the current upstream validators, but the helper is
explicitly billed as the "single source of truth" for this header, so it must
be safe in isolation.
📊 Verdict by Category
| Category | Status | Notes |
|---|---|---|
| 🔴 PII in logs | ✅ Pass | No new logging; no user identifiers added |
| 🔴 General security | 🟠 Concern | Helper does not strip CR/LF/control chars from filename (header injection) |
| 🔴 AI/LLM security | ✅ N/A | No LLM surface touched |
| 🔴 Third-party integration | ✅ N/A | No external webhook/signature surface |
| 🔴 DRY (backend) | ✅ Pass | 4 duplicated header strings collapsed into one helper — explicit DRY win |
| 🔴 DRY (frontend) | ❌ Fail | Parser block duplicated byte-for-byte across 2 hooks |
| 🔴 File structure | core.py already mixes responsibilities (legacy); PR does not worsen materially |
|
| 🟠 SOLID — SRP | ✅ Pass | Helper does one thing |
| 🟠 SOLID — others | ✅ N/A | No inheritance / interface surface introduced |
| 🟠 YAGNI / KISS | ✅ Pass | Small, targeted fix with no speculative API |
| 🟠 Law of Demeter | ✅ Pass | No deep chains added |
| 🟠 Error handling | Frontend decodeURIComponent can throw on malformed input — no try/catch |
|
| 🟠 Strong typing | ✅ Pass | All new signatures fully typed |
| 🟡 Logging | ✅ N/A | No logging added/removed |
| 🟡 Comments | ✅ Pass | Helper docstring explains the why (RFC reference + injection note) |
| 🟢 Testing — backend | 🟠 Partial | Happy paths covered; no adversarial cases (no ", \\, CR/LF, empty, very-long); one assertion still uses in |
| 🟢 Testing — frontend | ❌ Fail | Codecov reports 0 % coverage on the two changed hooks |
🔴 CRITICAL Findings (Blockers)
C1 — build_content_disposition() does not sanitize control characters → header-injection vector
Location: src/backend/base/langflow/api/utils/core.py:464-476
def build_content_disposition(filename: str) -> str:
ascii_fallback = filename.encode("ascii", "replace").decode("ascii").replace('"', '\\"')
encoded = quote(filename, safe="")
return f"attachment; filename=\"{ascii_fallback}\"; filename*=UTF-8''{encoded}"The assumption: the helper assumes non-ASCII is the only problematic class
of character in filename. The docstring even calls out
"prevent header-injection via parameter smuggling" — implying the helper is
the security boundary.
Why it's wrong: encode("ascii", "replace") only replaces bytes ≥ 0x80.
ASCII control chars (\x00–\x1F, including \r \n \t) are < 0x80 and
flow through unchanged. The upstream validators are not a safety net here:
_get_validated_path_segment(api/utils/core.py:39-46) only rejects..,
/,\\— not CRLF, not NUL, not other control chars.- The v2 download route uses
file.namestraight from the DB
(api/v2/files.py:675), bypassing path validation entirely. - The v1 projects-files route uses
project.name(api/v1/projects_files.py:79)
— same story.
Concrete payload (works against v2 if a user can store a file row with such
a name):
file.name = 'evil.txt\r\nX-Injected: pwned'
→ Content-Disposition: attachment; filename="evil.txt
X-Injected: pwned"; filename*=UTF-8''evil.txt%0D%0AX-Injected%3A%20pwned
Blast radius: in practice Starlette / h11 / hypercorn refuse to emit
headers containing raw CRLF and the connection 500s — but that is the
behaviour we are fixing in the first place. A future server change, a
non-Starlette deployment, or an HTTP/2 framing edge case turns this into
response-splitting / cache-poisoning / CRLF injection.
Fix (do this in the helper — defense in depth):
import re
_FORBIDDEN_HEADER_CHARS = re.compile(r"[\x00-\x1f\x7f]")
def build_content_disposition(filename: str) -> str:
safe_filename = _FORBIDDEN_HEADER_CHARS.sub("_", filename)
ascii_fallback = (
safe_filename.encode("ascii", "replace")
.decode("ascii")
.replace("\\", "\\\\") # see M1
.replace('"', '\\"')
)
encoded = quote(safe_filename, safe="")
return f'attachment; filename="{ascii_fallback}"; filename*=UTF-8\'\'{encoded}'Add a test: test_build_content_disposition_strips_crlf asserting \r\n in
input never appears in output.
🟠 HIGH Findings (Must fix before merge)
H1 — Frontend parser duplicated verbatim across two hooks (DRY)
Locations:
src/frontend/src/controllers/API/queries/file-management/use-get-download-files.ts:51-66src/frontend/src/controllers/API/queries/flows/use-get-download-flows.ts:50-65
The 14-line "extract filename from Content-Disposition" block — RFC 5987
regex, fallback regex, decode, trim — is byte-identical in both hooks. The
backend reviewer rule explicitly flags this as a 🔴 CRITICAL DRY violation
(5+ lines × 2+ occurrences = extract).
Fix: create
src/frontend/src/utils/parse-content-disposition-filename.ts:
export function parseContentDispositionFilename(
header: string | null,
fallback: string,
): string {
if (!header) return fallback;
const rfc5987 = header.match(/filename\*=UTF-8''([^;]+)/i);
if (rfc5987?.[1]) {
try {
return decodeURIComponent(rfc5987[1].trim());
} catch {
// fall through to legacy filename=
}
}
const legacy = header.match(/filename="?([^";\n]+)"?/);
return legacy?.[1]?.trim() ?? fallback;
}This also addresses M2 (decode failure) in one place.
H2 — Frontend changes have 0 % coverage
Codecov reports:
| File | Patch coverage |
|---|---|
use-get-download-files.ts |
0.00 % (13 missing) |
use-get-download-flows.ts |
7.69 % (12 missing) |
The reviewer rule requires ≥ 75 % (target 80 %) on all created tests and
explicit coverage validation for both backend AND frontend. Once the helper
from H1 exists, ship Jest unit tests in
src/frontend/src/utils/__tests__/parse-content-disposition-filename.test.ts
covering:
- Plain
filename="x.txt"→"x.txt" - RFC 5987 priority over legacy when both present
- Percent-decoded CJK round-trip (e.g.
龙.txt) - Missing header → fallback
- Malformed
%sequence → fallback (don't crash) - Header with
;inside the encoded value (regex bound check)
H3 — Adversarial test cases missing on the backend helper
The new tests (test_files.py v1 + v2, test_database.py) are entirely
happy-path parametrizations of well-formed CJK / Latin-extended names. The
reviewer rule is explicit: "Both happy path AND adversarial tests exist."
Missing cases — add as parametrized inputs to a direct unit test against
build_content_disposition() (no HTTP round-trip needed):
| Input | What it proves |
|---|---|
'a"b.txt' |
quote escape produces a parseable header |
'a\\b.txt' |
backslash handled (see M1) |
'evil\r\nX: y' |
CRLF stripped (see C1) |
'evil\x00.txt' |
NUL stripped |
'' (empty) |
helper does not produce malformed filename="" that breaks downstream |
'a' * 5000 |
very long names don't get truncated mid-codepoint |
'../etc/passwd' |
path-segment chars survive percent-encoding intact |
🟡 MEDIUM Findings
M1 — RFC 6266 §4.1 quoted-string: backslash is not escaped
api/utils/core.py:474 escapes " → \" but not \ → \\. Per RFC 7230
quoted-pair grammar (referenced from RFC 6266), both must be escaped together,
otherwise pre-existing backslashes corrupt the escape stream:
input: evil\"x.txt (7 chars: e v i l \ " x .txt)
current output (filename=): "evil\\"x.txt"
^^^^ ^^ ^ unmatched closing quote
RFC-compliant parse: "evil\\" → literal "evil\", then x.txt is junk
The fix is .replace("\\", "\\\\").replace('"', '\\"') (note the order — escape
backslashes first, otherwise you double-escape the backslash you just
inserted for the quote). Apply alongside C1.
M2 — Frontend decodeURIComponent is unguarded
use-get-download-files.ts:55 and use-get-download-flows.ts:54 call
decodeURIComponent on the captured group. If a malformed % sequence leaks
through (mismatched server / proxy rewrite / man-in-the-middle), decodeURIComponent
throws URIError and the entire onSuccess handler aborts — the user sees no
download and no error toast.
Wrap in try { ... } catch { /* fall back to legacy filename= */ } (folded
into H1's helper extraction).
M3 — v2 single-download test still uses weak in assertion
tests/unit/api/v2/test_files.py (new test):
assert filename.rsplit(".", 1)[0] in decoded
assert decoded.endswith(".txt")The commit message for d23d534b says "tighten weak in-operator assertion in
v2 test to check the correct direction". The direction was indeed flipped, but
the test still uses substring containment instead of equality. Because the
storage layer adds a unique prefix, the test should compute the expected
stored name (or read it from upload_response.json()) and assert exact equality:
stored_name = upload_response.json()["name"] + Path(upload_response.json()["path"]).suffix
assert decoded == stored_nameThis is the same pattern already used correctly in the v1 test
(assert unquote(rfc5987_value) == stored_name).
🟢 NICE TO HAVE
N1 — core.py is becoming a dumping ground (legacy)
api/utils/core.py is now 476 lines and mixes validate_*, build_*,
has_*, remove_*, check_*, raise_*, plus Annotated type aliases and
an Enum. Per the reviewer rule, validate* and build* must not coexist in
the same file. This is a pre-existing legacy issue; this PR is consistent
with the surrounding style and should not be blocked on it. Worth flagging for
a future cleanup ticket — propose splitting into validators.py,
http_headers.py (where this new helper would live), path_helpers.py, etc.
N2 — Helper return type / test could pin RFC ordering
The helper's contract is "ASCII fallback first, then filename*=". RFC 6266
§4.3 actually recommends filename*= first for clients that prefer it, but
the dual-param order is not strictly defined. Worth a one-line comment
explaining the chosen order (legacy clients that read only the first
filename= see the ASCII fallback; modern clients prefer filename*= regardless).
✅ What this PR does well
- Correct root-cause diagnosis. Latin-1 header encoding is the actual
defect, and the fix targets the encoding boundary, not the symptom. - DRY win on the backend. Four near-identical header strings collapsed
into one well-documented helper. Found and fixed an uncovered fifth route
(projects_files.py) along the way — that file emitted onlyfilename*=
with no ASCII fallback, which would fall back to nothing on legacy clients. safe=""is correct. The intermediate commitd23d534bcorrectly
identified thatquote()defaults tosafe="/", which would emit raw
forward slashes inside an RFC 5987 value. Now percent-encoded.- Quote escaping in the ASCII fallback prevents the most obvious
parameter-smuggling vector (modulo M1 / C1). - Backend coverage report shown (
+0.48 %oncore.py,+3.51 %on
v2/files.py) and parametrized tests across CJK / Latin-extended / ASCII. - PR description is excellent — manual test plan, curl reproducer,
linked issue, all migration / docs / lint CI passing.
📋 Suggested resolution order
- C1 — strip control chars in
build_content_disposition+ add unit test. - M1 — escape backslashes in the same helper (one-line change, one extra test).
- H1 — extract the frontend parser to a shared util.
- H2 — add Jest tests for the new util.
- H3 — add a direct backend unit test for the helper with adversarial inputs (cheaper than HTTP round-trip in
test_files.py). - M3 — tighten the v2 download test to equality.
- M2 — folded into H1.
- N1 / N2 — open follow-up tickets, do not block.
Backend changes 1, 2, 5 are ~20 lines. Frontend changes 3, 4 are ~40 lines
plus a small test file. Total turnaround ≈ 1 hour of work.
…position Strip ASCII control characters (CR/LF/NUL and others in 0x00-0x1f/0x7f range) from filename before encoding to prevent header injection. Also escape backslash before double-quote in the ASCII fallback per RFC 6266 §4.1 quoted-pair grammar.
…hen v2 assertion Add direct unit tests for the helper covering: quote escaping, backslash escaping, CRLF/NUL/control char stripping, empty filename, and very long names. Replace weak substring assertions in the v2 single-file download test with exact equality against the stored filename returned by the upload response.
Extract the duplicated 14-line RFC 5987 parsing block from use-get-download-files and use-get-download-flows into a single parseContentDispositionFilename util. Wraps decodeURIComponent in try/catch to gracefully fall back to the legacy filename= param on malformed percent-sequences. Adds Jest tests covering all branches: RFC 5987 priority, legacy fallback, CJK decode, null header, and malformed input.
Closes #8105
Summary
Content-Dispositionheaders to use RFC 5987 / RFC 6266 dual-param format for file downloads:filename="ascii-fallback"; filename*=UTF-8''<percent-encoded>filename*=UTF-8''over plainfilename=when saving downloadsquote()usessafe=""so forward slashes and all special characters are properly percent-encoded in the RFC 5987 valueHow to test (end user)
Pre-requisite
Run Langflow locally with
make backend+make frontend(orlangflow run).1. Upload and download a single file with a non-ASCII name
龙テスト.txtbefore uploading龙テスト.txt(not?.txtor a garbled name)2. Download multiple files as a ZIP (API v2)
résumé.pdf,测试.txt)20240428_123456_langflow_files.zip)3. Download flows as a ZIP
20240428_123456_langflow_flows.zip) and opens without errors4. Verify the header directly (optional, for developers)
Test plan