Skip to content

Conversation

@pallaprolus
Copy link

Fixes #68

This PR addresses the issue where partially numbered lists (common in MasterFormat documents, e.g., .1, .2) are extracted as plain text lines indistinguishable from regular paragraphs.

Changes:

  • Adds a lightweight regex post-processing step in _pdf_converter.py to identify lines starting with .Number and convert them into Markdown lists (- .Number).
  • This keeps the solution dependency-free and lightweight as requested by maintainers.

Verification:

  • Verified that lines like .1 Item are now converted to - .1 Item.
  • Ran standard tests to ensure no regressions.
@pallaprolus pallaprolus force-pushed the fix/issue-68-pdf-lists branch from 87d7b54 to 76a674a Compare December 30, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant