-
Notifications
You must be signed in to change notification settings - Fork 96
Description
I'd like to use pdfium-render to access all "primitive" elements (characters, paths, images) in the order that they are rendered, so that I can determine visibility for each such element (accounting for occlusion of primitives rendered earlier due to simple obstruction, clipping paths, etc).
I figured that I would be able to do this by iterating through PdfPage.objects(), and within that, iterating through each PdfPageTextObject.chars(). However, the latter doesn't retrieve individual chars specifically associated with a given text object; rather, it grounds out in a bounding-box search:
pdfium-render/src/page_text.rs
Lines 95 to 101 in c0038a6
| pub fn chars_for_object( | |
| &self, | |
| object: &PdfPageTextObject, | |
| ) -> Result<PdfPageTextChars, PdfiumError> { | |
| self.chars_inside_rect(object.bounds()?) | |
| .map_err(|_| PdfiumError::NoCharsInPageObject) | |
| } |
Of course, this doesn't reflect original rendering order at all, and ironically will result in the same character being visited multiple times, in the case of overlapping text objects.
Is there a way to access primitives, down to the character level, in rendered order (or with a render-order property if direct iteration isn't possible)?
(Thanks so much for this library, the work is greatly appreciated. 🙇)