refactor TextToPDF.call method #388

valerybokov · 2026-01-01T15:42:36Z

Current algorithm is:
1 if charset is UTF8 then read 3 bytes.
2 if these 3 bytes have expected values then mark a hasUtf8BOM variable as true
3 close stream
4 open new stream of the same file
5 if the variable hasUtf8BOM is true then skip 3 bytes.
6 if couldn't skip 3 bytes then throw an exception
7 If bytes were skipped or there is no need to skip them, the rest of the file should be read.

These are questions not for you, but for the algorithm:
1 why we need read the file twice when it increases the likelihood that we won't succeed the second time?
2 Opening a stream twice is slower than opening it once.
3 According to the code, there's a possibility that we couldn't read the file a second time. Then why isn't there a check to see if the file is corrupted? That is, it's UTF-8 encoding, but what if one or two of these three bytes are different. Perhaps the format has such combinations that this can't be verified.

At this point, I propose making one stream instead of two. There are no other changes.

refactor TextToPDF.call method

ffc6b82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor TextToPDF.call method #388

refactor TextToPDF.call method #388

Uh oh!

valerybokov commented Jan 1, 2026

Labels

1 participant

refactor TextToPDF.call method #388

Are you sure you want to change the base?

refactor TextToPDF.call method #388

Uh oh!

Conversation

valerybokov commented Jan 1, 2026

Labels

1 participant