-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Base swtich #1520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dependabot/github_actions/actions/setup-python-6
Are you sure you want to change the base?
Base swtich #1520
Conversation
|
@jarlungoodoo73 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request addresses CVE-2025-11849, a security vulnerability related to r:link resources in DOCX files. The changes upgrade the mammoth library dependency and remove a previous monkey-patch workaround that disabled r:link processing.
Key changes:
- Upgrades mammoth dependency from version 1.10.0 to 1.11.0 to leverage native security fixes
- Removes the monkey-patch code that previously disabled r:link resource processing in DOCX files
- Adds a new test case to verify that r:link resources are not embedded in output, preventing potential path traversal attacks
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
packages/markitdown/tests/test_module_misc.py |
Adds test_doc_rlink() to verify CVE-2025-11849 fix; comments out YouTube test |
packages/markitdown/tests/test_files/rlink.docx |
New test fixture containing DOCX with r:link resource for security testing |
packages/markitdown/src/markitdown/converters/_docx_converter.py |
Removes monkey-patch that disabled r:link processing, relying on library fix instead |
packages/markitdown/pyproject.toml |
Updates mammoth dependency from ~1.10.0 to ~1.11.0 in both optional dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Save reporting of any exceptions for later | ||
| _dependency_exc_info = None | ||
| try: | ||
| import mammoth |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the monkey-patch that disabled r:link processing appears to be intentional for addressing CVE-2025-11849, presumably because mammoth 1.11.0 handles this securely. However, since version 1.11.0 doesn't exist yet, this change may introduce a security vulnerability by re-enabling r:link processing with mammoth 1.10.0. The code changes should be synchronized with the actual availability of the secure mammoth version.
| rlink_file_path = os.path.join(rlink_tmp_dir, "test_rlink.txt") | ||
| rlink_content = "de658225-569e-4e3d-9ed2-cfb6abf927fc" | ||
| b64_prefix = ( | ||
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" # base64 prefix of rlink_content |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "# base64 prefix of rlink_content" is misleading. The value "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" is truncated and not a complete base64 encoding. The full base64 encoding of "de658225-569e-4e3d-9ed2-cfb6abf927fc" would be "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjkyN2Zj". Consider either using the complete base64 string or clarifying in the comment that this is a partial/prefix match.
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" # base64 prefix of rlink_content | |
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjkyN2Zj" # base64 encoding of rlink_content |
| finally: | ||
| os.remove(rlink_file_path) |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test unconditionally deletes the rlink file in the finally block, even if the file existed before the test ran. This could delete user data if the file was already present. Consider only deleting the file if the test created it (when os.path.exists returned False initially).
| if os.path.exists(rlink_file_path): | ||
| with open(rlink_file_path, "r", encoding="utf-8") as f: | ||
| existing_content = f.read() | ||
| if existing_content != rlink_content: | ||
| raise ValueError( | ||
| f"Existing {rlink_file_path} content does not match expected content." | ||
| ) | ||
| else: | ||
| with open(rlink_file_path, "w", encoding="utf-8") as f: | ||
| f.write(rlink_content) | ||
|
|
||
| try: |
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test skips when /tmp doesn't exist but doesn't clean up the created file if the test fails between file creation (line 321) and the finally block. If an assertion fails or an exception is raised, the file will be left behind. Consider wrapping the file creation in the try block or using a proper temporary directory with proper cleanup.
| if os.path.exists(rlink_file_path): | |
| with open(rlink_file_path, "r", encoding="utf-8") as f: | |
| existing_content = f.read() | |
| if existing_content != rlink_content: | |
| raise ValueError( | |
| f"Existing {rlink_file_path} content does not match expected content." | |
| ) | |
| else: | |
| with open(rlink_file_path, "w", encoding="utf-8") as f: | |
| f.write(rlink_content) | |
| try: | |
| try: | |
| if os.path.exists(rlink_file_path): | |
| with open(rlink_file_path, "r", encoding="utf-8") as f: | |
| existing_content = f.read() | |
| if existing_content != rlink_content: | |
| raise ValueError( | |
| f"Existing {rlink_file_path} content does not match expected content." | |
| ) | |
| else: | |
| with open(rlink_file_path, "w", encoding="utf-8") as f: | |
| f.write(rlink_content) |
| return | ||
|
|
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The early return statement after pytest.skip is unnecessary. pytest.skip raises an exception that prevents further execution, so the return on line 304 will never be reached.
| return |
| # result = markitdown.convert(YOUTUBE_TEST_URL) | ||
| # for test_string in YOUTUBE_TEST_STRINGS: | ||
| # assert test_string in result.text_content | ||
|
|
||
|
|
Copilot
AI
Jan 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # result = markitdown.convert(YOUTUBE_TEST_URL) | |
| # for test_string in YOUTUBE_TEST_STRINGS: | |
| # assert test_string in result.text_content |
Git, ideas committs