Skip to content

fix(scrapy): Resolve Crawlee's request data round-trip failure in request conversion#832

Merged
vdusek merged 2 commits intomasterfrom
fix/scrapy-crawlee-request-data-roundtrip
Mar 16, 2026
Merged

fix(scrapy): Resolve Crawlee's request data round-trip failure in request conversion#832
vdusek merged 2 commits intomasterfrom
fix/scrapy-crawlee-request-data-roundtrip

Conversation

@vdusek
Copy link
Copy Markdown
Contributor

@vdusek vdusek commented Mar 13, 2026

Summary

  • Adds a reproduction test for a bug where to_apify_request() fails with CrawleeRequestData() argument after ** must be a mapping, not CrawleeRequestData after two roundtrips through Scrapy↔Apify request conversion when spiders propagate userData to follow-up requests
  • Root cause: Request.from_url() writes a CrawleeRequestData object into UserData.__pydantic_extra__['__crawlee'], which is then found by .get('__crawlee') on the next roundtrip and passed to CrawleeRequestData(**obj) instead of CrawleeRequestData(**dict)
  • Test is marked xfail until the fix is applied

Test plan

  • Reproduction test added and confirmed to trigger the exact error from production
  • Fix to be applied in to_apify_request() to sanitize user_data before passing to Request.from_url()
  • Remove xfail marker after fix

🤖 Generated with Claude Code

After two roundtrips through to_apify_request/to_scrapy_request with
userData propagation, CrawleeRequestData object leaks into
UserData.__pydantic_extra__['__crawlee'], causing subsequent conversions
to fail with "argument after ** must be a mapping, not CrawleeRequestData".

Marked as xfail until the fix is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vdusek vdusek added adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. labels Mar 13, 2026
@vdusek vdusek self-assigned this Mar 13, 2026
@github-actions github-actions Bot added this to the 136th sprint - Tooling team milestone Mar 13, 2026
@github-actions github-actions Bot added the tested Temporary label used only programatically for some analytics. label Mar 13, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.47%. Comparing base (e343ed5) to head (febd963).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #832      +/-   ##
==========================================
+ Coverage   86.44%   86.47%   +0.02%     
==========================================
  Files          48       48              
  Lines        2899     2905       +6     
==========================================
+ Hits         2506     2512       +6     
  Misses        393      393              
Flag Coverage Δ
e2e 38.03% <0.00%> (-0.08%) ⬇️
integration 59.44% <0.00%> (-0.13%) ⬇️
unit 74.45% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
@vdusek vdusek changed the title fix: Resolve CrawleeRequestData roundtrip failure in Scrapy integration Mar 13, 2026
@vdusek vdusek requested a review from Pijukatel March 13, 2026 20:55
@vdusek
Copy link
Copy Markdown
Contributor Author

vdusek commented Mar 13, 2026

@honzajavorek Could you please confirm this fix resolves your issue?

@honzajavorek
Copy link
Copy Markdown
Contributor

Wow, quick fix! 👀 I'll try next week, too much on my plate now.

@vdusek
Copy link
Copy Markdown
Contributor Author

vdusek commented Mar 16, 2026

@honzajavorek try it then with the latest beta and let us know, thanks

@vdusek vdusek merged commit 3b9d588 into master Mar 16, 2026
31 checks passed
@vdusek vdusek deleted the fix/scrapy-crawlee-request-data-roundtrip branch March 16, 2026 09:13
honzajavorek added a commit to juniorguru/plucker that referenced this pull request Mar 24, 2026
@honzajavorek
Copy link
Copy Markdown
Contributor

I gave it a go and I think the warnings are gone! Thanks! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

4 participants