Allow option to periodically print last ingested line (line mode) #121

Closed
opened 2024-12-03 11:47:51 +01:00 by meego · 5 comments

The idea in a nutshell:

A -P/--preview option that, when working in line mode, periodically prints the last ingested line to stderr.

Attached is a video of an in-house script implementing this behavior.

Use case:

The use case where I found this desirable was when working with logs. Logs are multiline text data, typically of unknown volume/length, and the volume of data can be really very large. When piping logs, progress is typically difficult to infer from counting lines and/or bytes. It would useful to print the last ingested line because it typically contains a date, making it possible to estimate progress.

Implementation suggestion:

  • -P/--preview option that, when working in line mode, periodically prints the last ingest line to stderr. Byte mode is unsupported: the -P option does nothing if line mode is not enabled. My thinking is printing binary data is likely either useless, or requires fine control over textual representation for which good defaults are likely hard to pin down (e.g. hexadecimal?, length of preview?)
  • -i/--interval existing parameter can be re-used
**The idea in a nutshell:** A `-P`/`--preview` option that, when working in line mode, periodically prints the last ingested line to stderr. Attached is a video of an in-house script implementing this behavior. **Use case:** The use case where I found this desirable was when working with logs. Logs are multiline text data, typically of unknown volume/length, and the volume of data can be really very large. When piping logs, progress is typically difficult to infer from counting lines and/or bytes. It would useful to print the last ingested line because it typically contains a date, making it possible to estimate progress. Implementation suggestion: - `-P`/`--preview` option that, when working in _line mode_, periodically prints the last ingest line to stderr. _Byte mode_ is unsupported: the `-P` option does nothing if line mode is not enabled. My thinking is printing binary data is likely either useless, or requires fine control over textual representation for which good defaults are likely hard to pin down (e.g. hexadecimal?, length of preview?) - `-i`/`--interval` existing parameter can be re-used
meego changed title from Allow option to print sample data at selectable rate to Allow option to periodically print last ingested line (line mode) 2024-12-03 11:53:03 +01:00
Owner

I think this could work as a format option "%nL" to show the first n characters of the most recent line, such as "pv --line-mode --format '%b %80L'" to show the line number and first 80 characters of the newest line. Perhaps the number could be omitted to show the whole line up to the terminal width. For binary we already have the format sequence %nA to show the last n bytes transferred, but for line mode you want to show from the start of the line.

Does that sound like it would meet your use case?

I think this could work as a format option "*%nL*" to show the first *n* characters of the most recent line, such as "*pv --line-mode --format '%b %80L'*" to show the line number and first 80 characters of the newest line. Perhaps the number could be omitted to show the whole line up to the terminal width. For binary we already have the format sequence *%nA* to show the last *n* bytes transferred, but for line mode you want to show from the start of the line. Does that sound like it would meet your use case?
Author

This sounds perfect.

This sounds perfect.
Owner

Hello

An implementation has now been committed. Using a format string sequence of "%L" will use the available space in the progress display to show the most recently written line. It can be given a fixed width such as "%80L".

For example:

pv -F '%a %p : %L' big.log | processing-script

It comes with a couple of caveats. It can't be used with splice(2), since it needs to read from the transfer buffer and that's bypassed when using splice. And with small amounts of data, the buffering the OS does in the pipeline itself will make it ineffective.

This should be in the next release.

Hello An implementation has now been committed. Using a format string sequence of "**%L**" will use the available space in the progress display to show the most recently written line. It can be given a fixed width such as "**%80L**". For example: pv -F '%a %p : %L' big.log | processing-script It comes with a couple of caveats. It can't be used with **splice**(2), since it needs to read from the transfer buffer and that's bypassed when using splice. And with small amounts of data, the buffering the OS does in the pipeline itself will make it ineffective. This should be in the next release.
a-j-wood referenced this issue from a commit 2024-12-08 00:26:22 +01:00
Owner

Version 1.9.15 has now been released, which includes this feature.

Version 1.9.15 has now been released, which includes this feature.
Author

Just gave 1.9.15 a try, it works great, thanks!

Just gave 1.9.15 a try, it works great, thanks!
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ivarch/pv#121
No description provided.