[Draft] Exploring ways to allow Optional dependencies#1079
Conversation
packages/markitdown/src/markitdown/converters/_pptx_converter.py
Outdated
Show resolved
Hide resolved
|
@Zahlii LMK what you thing about this approach. Main thing is that I want the converts to print useful message when they think they could handle the file, but don't have the right dependencies. (alternatively, you could simply not register the converters at all... but then I worry about discoverability for people not reading docs) |
|
I think this will work fine for me - I didn’t check all of the dependencies in detail (maybe there are some more that could be marked as optional?) but at least those that currently lead to CVE warnings are now optional. |
There's certainly a balance here. Some are needed just to help identify what filetypes are in use... and those need to be around in all cases. Others help with some of the fall-back converters (plain text etc.). And more generally, html-related dependencies are used by many converters (because the libraries might output HTML rather than markdown). Plus it's generally nice to be able to convert web pages out of the box, I'd argue. I think I'm satisfied with this first cut add organization. |
| * `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files | ||
| * `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription |
There was a problem hiding this comment.
Personal preference...
| * `[audio-transcription]` Installs dependencies for audio transcription of wav and mp3 files | |
| * `[youtube-transcription]` Installs dependencies for fetching YouTube video transcription | |
| * `[audio]` Installs dependencies for audio transcription of wav and mp3 files | |
| * `[youtube]` Installs dependencies for fetching YouTube video transcription |
There was a problem hiding this comment.
Agree shorter is better. However, we can still get metadata for YouTube (title, video descriptions etc.) even if the transcription library is not installed. Likewise we can get metadata for audio files (runtime, track title, artist, album, etc) even when transcription is not enabled. To this end, I felt it was worth the added characters for precision.
There was a problem hiding this comment.
Strongly disagree; developer precision has nothing to do with user experience.
|
I like it |
|
K, merging to main... but I'll hold of from releasing to PyPi or cutting a release until we've had a chance to shake-test this a little more, and address a few other potentially API-breaking changes in the works. |
No description provided.