#100 - Feature request: Allow buffering all input to calculate total size before proceeding - ivarch/pv

michaelmior commented

2024-10-01 20:44:40 +02:00

My use case is that I'm using pv in a long pipeline to process files

find . -name '*.ext' | pv -l | while read f; do ./something_slow "$f"; done

In this case, the find command is quite fast and doesn't return a ton of output. It's not find that I want to monitor the progress of, but everything downstream. Now, I can work around it by writing the output to a temporary file first.

find . -name '*.ext' > /tmp/files.txt; pv -l files.txt | while read f; do ./something_slow "$f"; done

However, it would be nice if there was a flag to pv that would effectively replicate this behavior. That is, write its input to a temporary file and then use that file to calculate the progress through the input to enable showing an accurate total.

My use case is that I'm using `pv` in a long pipeline to process files find . -name '*.ext' | pv -l | while read f; do ./something_slow "$f"; done In this case, the `find` command is quite fast and doesn't return a ton of output. It's not `find` that I want to monitor the progress of, but everything downstream. Now, I can work around it by writing the output to a temporary file first. find . -name '*.ext' > /tmp/files.txt; pv -l files.txt | while read f; do ./something_slow "$f"; done However, it would be nice if there was a flag to `pv` that would effectively replicate this behavior. That is, write its input to a temporary file and then use that file to calculate the progress through the input to enable showing an accurate total.

a-j-wood commented

2024-10-03 11:28:49 +02:00

Owner

Hello, thanks for this. As you describe and have shown in your workaround, it would need to run in two parts - first it would consume the input until the input is exhausted, and then it would output everything. It's a sort of "store and forward" option.

This could be done either with or without keeping a copy of the input. For your use case, you'd probably not want to keep the input, so you'd want PV to handle the creation and removal of the temporary file. Maybe for some other use cases we'd want to keep the input, so we would tell PV what file to use.

This might look something like this:

$ find . -name '*.ext' | pv -l -U - | while read f; do ./something_slow "$f"; done
  (input): 49.0  0:00:04 [10.0 /s] [    <=>                                           ]
30.0  0:00:03 [10.1 /s] [=========================>                  ]  61% ETA 0:00:02

where the new option is "-U" and it takes a parameter which is the file to write the input to, or "-" to write to a temporary file which is automatically discarded.

It would be nice if it worked properly with the "-D" option so you could make the first line only appear if consuming the input was taking a long time. Let's say the input normally takes less than 2 seconds to receive and store - then we might do this:

$ find . -name '*.ext' | pv -l -D 2 -U - | while read f; do ./something_slow "$f"; done
30.0  0:00:03 [10.1 /s] [=========================>                  ]  61% ETA 0:00:02

That way, we only see the "(input)" progress bar if storing the input takes 2 seconds or more.

Note that I chose "-U" as the new option letter because PV's running out of option letters, and "U" reminded me of UUCP, which is a store-and-forward mechanism. The long option would be "--store-and-forward". Maybe there are better letters and names to choose.

Does what I've described here sound like it matches your use case?

Hello, thanks for this. As you describe and have shown in your workaround, it would need to run in two parts - first it would consume the input until the input is exhausted, and then it would output everything. It's a sort of "store and forward" option. This could be done either with or without keeping a copy of the input. For your use case, you'd probably not want to keep the input, so you'd want PV to handle the creation and removal of the temporary file. Maybe for some other use cases we'd want to keep the input, so we would tell PV what file to use. This might look something like this: $ find . -name '*.ext' | pv -l -U - | while read f; do ./something_slow "$f"; done (input): 49.0 0:00:04 [10.0 /s] [ <=> ] 30.0 0:00:03 [10.1 /s] [=========================> ] 61% ETA 0:00:02 where the new option is "-U" and it takes a parameter which is the file to write the input to, or "-" to write to a temporary file which is automatically discarded. It would be nice if it worked properly with the "-D" option so you could make the first line only appear if consuming the input was taking a long time. Let's say the input normally takes less than 2 seconds to receive and store - then we might do this: $ find . -name '*.ext' | pv -l -D 2 -U - | while read f; do ./something_slow "$f"; done 30.0 0:00:03 [10.1 /s] [=========================> ] 61% ETA 0:00:02 That way, we only see the "(input)" progress bar if storing the input takes 2 seconds or more. Note that I chose "-U" as the new option letter because PV's running out of option letters, and "U" reminded me of UUCP, which is a store-and-forward mechanism. The long option would be "--store-and-forward". Maybe there are better letters and names to choose. Does what I've described here sound like it matches your use case?

a-j-wood added the

enhancement

label

2024-10-03 11:29:02 +02:00

michaelmior commented

2024-10-03 13:35:53 +02:00

Author

@a-j-wood That sounds great! I think that would handle exactly what I'm trying to do well.

a-j-wood referenced this issue from a commit

2024-10-05 17:22:14 +02:00

New --store-and-forward option (#100), including refactoring several areas to allow PV to run in two passes, and the ability to look at the output pipe buffer utilisation to more accurately reflect the total data transferred to the receiving end, rather than just the total amount written to the output buffer.

a-j-wood commented

2024-10-05 18:30:24 +02:00

Owner

The latest commit provides this to some degree, but I'm having trouble with line mode.

The reason for this is that between PV's output and the thing reading from it, there is the pipe buffer. PV writes its output, and it goes into the pipe buffer, and PV thinks it's done. But maybe the reader will take a while to consume the whole buffer.

There is a way for PV to look at how much is waiting to be read from the pipe buffer - that is, how much PV has already written that the next command in the pipeline hasn't read yet. So PV now takes that into account when reporting progress.

We take the number of bytes we've written, subtract how many are sitting in the output pipe's buffer unread, and we say the result is how many bytes the next process has actually received - we show that as the amount transferred.

This works fine when all we're counting is bytes. It gets tricky when we're counting lines. It's tricky because PV can find out how many bytes the receiving process hasn't yet read, but it doesn't have any way to know how many lines that is.

So at the moment here's what it looks like when I use the new "-U" option and interrupt it after 5 seconds.

$ timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; }
  (input): 53.7KiB 0:00:00 [ 219MiB/s] [==================================================>] 100%            
17.2KiB 0:00:04 [4.21KiB/s] [==================>                                           ]  32% ETA 0:00:08
Terminated

So far, so good, it's behaving how we want. The input was pretty much instantaneous, then the output proceeds at the rate the receiver is actually reading the lines. But we are counting bytes, not lines, so it's not so good for your use case.

Here's what happens in line mode.

$ timeout 5 ./pv -l -C -U - Makefile | { while read -r line; do sleep 0.003; done; }
  (input): 1.59k 0:00:00 [5.56M/s] [======================================================>] 100%            
1.59k 0:00:04 [0.00 /s] [=================================================================>] 100% ETA 0:00:00
Terminated

In line mode, PV can't take the pipe buffer into account, so what we see is that it immediately shows the output at 100% - because the whole thing fits in the pipe buffer - and then just sits there, while the reader processes it slowly.

So, the latest commit contains this new feature - but it's not quite ready for your use case, until someone comes up with a way of counting the number of lines sitting in the pipe buffer. I'll try to think of something, but suggestions are definitely welcome.

The latest commit provides this to some degree, but I'm having trouble with line mode. The reason for this is that between PV's output and the thing reading from it, there is the pipe buffer. PV writes its output, and it goes into the pipe buffer, and PV thinks it's done. But maybe the reader will take a while to consume the whole buffer. There is a way for PV to look at how much is waiting to be read from the pipe buffer - that is, how much PV has already written that the next command in the pipeline hasn't read yet. So PV now takes that into account when reporting progress. We take the number of bytes we've written, subtract how many are sitting in the output pipe's buffer unread, and we say the result is how many bytes the next process has actually received - we show that as the amount transferred. This works fine when all we're counting is bytes. It gets tricky when we're counting lines. It's tricky because PV can find out how many _bytes_ the receiving process hasn't yet read, but it doesn't have any way to know how many _lines_ that is. So at the moment here's what it looks like when I use the new "-U" option and interrupt it after 5 seconds. $ timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; } (input): 53.7KiB 0:00:00 [ 219MiB/s] [==================================================>] 100% 17.2KiB 0:00:04 [4.21KiB/s] [==================> ] 32% ETA 0:00:08 Terminated So far, so good, it's behaving how we want. The input was pretty much instantaneous, then the output proceeds at the rate the receiver is actually reading the lines. But we are counting bytes, not lines, so it's not so good for your use case. Here's what happens in line mode. $ timeout 5 ./pv -l -C -U - Makefile | { while read -r line; do sleep 0.003; done; } (input): 1.59k 0:00:00 [5.56M/s] [======================================================>] 100% 1.59k 0:00:04 [0.00 /s] [=================================================================>] 100% ETA 0:00:00 Terminated In line mode, PV can't take the pipe buffer into account, so what we see is that it immediately shows the output at 100% - because the whole thing fits in the pipe buffer - and then just sits there, while the reader processes it slowly. So, the latest commit contains this new feature - but it's not quite ready for your use case, until someone comes up with a way of counting the number of _lines_ sitting in the pipe buffer. I'll try to think of something, but suggestions are definitely welcome.

a-j-wood commented

2024-10-06 00:07:39 +02:00

Owner

This should now also be OK in line mode, as PV now keeps track of line positions (up to a certain number of lines).

$ timeout 5 ./pv -l -C -U - Makefile | { while read -r line; do sleep 0.003; done; }
  (input): 1.60k 0:00:00 [5.69M/s] [====================================================>] 100%            
 501  0:00:04 [ 125 /s] [==================>                                             ]  31% ETA 0:00:08
Terminated

Assuming no other problems crop up with this, it will be in the next release.

This should now also be OK in line mode, as PV now keeps track of line positions (up to a certain number of lines). $ timeout 5 ./pv -l -C -U - Makefile | { while read -r line; do sleep 0.003; done; } (input): 1.60k 0:00:00 [5.69M/s] [====================================================>] 100% 501 0:00:04 [ 125 /s] [==================> ] 31% ETA 0:00:08 Terminated Assuming no other problems crop up with this, it will be in the next release.

a-j-wood closed this issue

2024-10-06 00:07:41 +02:00

michaelmior commented

2024-10-07 13:24:43 +02:00

Author

@a-j-wood Thanks! Unfortunately, I don't see a progress bar for the output at all when I try the same (regardless of line mode or not). I will note that I'm Ubuntu 20.04 which has an old version of gettext, so I made static build of pv from inside a Docker container.

$ timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; }
  (input): 54.0KiB 0:00:00 [ 181MiB/s] [================================================>] 100%

@a-j-wood Thanks! Unfortunately, I don't see a progress bar for the output at all when I try the same (regardless of line mode or not). I will note that I'm Ubuntu 20.04 which has an old version of gettext, so I made static build of pv from inside a Docker container. ``` $ timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; } (input): 54.0KiB 0:00:00 [ 181MiB/s] [================================================>] 100% ```

a-j-wood reopened this issue

2024-10-07 20:47:49 +02:00

a-j-wood commented

2024-10-07 20:55:39 +02:00

Owner

Strange. I've just tried on Ubuntu 24.04 and it works for me. I don't have a 20.04 system to test on at the moment.

What do you see if you run this?

$ /sbin/sysctl fs.pipe-max-size
fs.pipe-max-size = 1048576

Strange. I've just tried on Ubuntu 24.04 and it works for me. I don't have a 20.04 system to test on at the moment. What do you see if you run this? $ /sbin/sysctl fs.pipe-max-size fs.pipe-max-size = 1048576

michaelmior commented

2024-10-07 21:04:46 +02:00

Author

$ /sbin/sysctl fs.pipe-max-size
fs.pipe-max-size = 1048576

``` $ /sbin/sysctl fs.pipe-max-size fs.pipe-max-size = 1048576 ```

a-j-wood commented

2024-10-07 21:25:25 +02:00

Owner

Thanks. That rules that out. I've just tried on Ubuntu 22.04 (in a container, so maybe not the real thing) and it still worked. It also works on my CentOS 5 test system, so it's unlikely to be related to the age of the system. So I'm very much in the dark.

Can you please run this:

./configure --enable-debugging
make clean
make 2>E
./pv --debug A -C -U - Makefile | ./pv --debug B -CqXL 5k
uname -a >> M
cat /proc/cpuinfo >> M
tar czf report.tgz src/include/config.h config.status A B E M

and send me the resulting report.tar.gz?

Thanks. That rules that out. I've just tried on Ubuntu 22.04 (in a container, so maybe not the real thing) and it still worked. It also works on my CentOS 5 test system, so it's unlikely to be related to the age of the system. So I'm very much in the dark. Can you please run this: ./configure --enable-debugging make clean make 2>E ./pv --debug A -C -U - Makefile | ./pv --debug B -CqXL 5k uname -a >> M cat /proc/cpuinfo >> M tar czf report.tgz src/include/config.h config.status A B E M and send me the resulting report.tar.gz?

michaelmior commented

2024-10-07 22:14:55 +02:00

Author

See attached. Note that I still to run configure and make in a Docker container, but everything else ran on the host. I also had to add --enable-static to configure since my glibc is too old otherwise.

See attached. Note that I still to run `configure` and `make` in a Docker container, but everything else ran on the host. I also had to add `--enable-static` to configure since my glibc is too old otherwise.

report.tgz

18 KiB

a-j-wood commented

2024-10-07 23:34:51 +02:00

Owner

The debug logs suggest that PV wrote the output progress bar, or at least got it ready. I'm not sure why it wouldn't have been displayed. I suppose we could try one more thing and use "strace", if you've got it installed:

strace -ytt -o strace.out ./pv --debug debug.out -C -U - Makefile | ./pv -CqXL 5k
tar czf report2.tgz strace.out debug.out

Do you have any other systems you could try it on? I've tried it in tmux in gnome-terminal, connecting to CentOS 5, Rocky 8, Alma 9, Devuan 5, Debian 12, Alpine, Ubuntu 24.04, and on the console of a container running Ubuntu 22.04.

Also what terminal emulator are you using?

The debug logs suggest that PV wrote the output progress bar, or at least got it ready. I'm not sure why it wouldn't have been displayed. I suppose we could try one more thing and use "strace", if you've got it installed: strace -ytt -o strace.out ./pv --debug debug.out -C -U - Makefile | ./pv -CqXL 5k tar czf report2.tgz strace.out debug.out Do you have any other systems you could try it on? I've tried it in tmux in gnome-terminal, connecting to CentOS 5, Rocky 8, Alma 9, Devuan 5, Debian 12, Alpine, Ubuntu 24.04, and on the console of a container running Ubuntu 22.04. Also what terminal emulator are you using?

michaelmior commented

2024-10-08 01:11:59 +02:00

Author

The second report is attached. I will note that this time I did see the progress bar. I'm using iTerm 2 and I can test locally on my MacBook at some point.

report2.tgz

4.1 KiB

a-j-wood commented

2024-10-09 19:48:46 +02:00

Owner

I was hoping that we'd capture the progress bar not appearing in the strace output.

If we go back to "timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; }" - does that show the expected result if you invoke it with strace, like this?

timeout 5 strace -ytt -o strace2.out ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; }

i.e. does running under strace "fix" it, and it continues to show no progress bar when you omit the strace part?

I was hoping that we'd capture the progress bar not appearing in the strace output. If we go back to "`timeout 5 ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; }`" - does that show the expected result if you invoke it with strace, like this? timeout 5 strace -ytt -o strace2.out ./pv -C -U - Makefile | { while read -r line; do sleep 0.003; done; } i.e. does running under strace "fix" it, and it continues to show no progress bar when you omit the strace part?

michaelmior commented

2024-10-09 19:58:37 +02:00

Author

@a-j-wood I don't see the progress bar in either case (unlike the previous version that was sent). I attached that strace output here.

strace2.out

25 MiB

a-j-wood commented

2024-10-09 20:03:12 +02:00

Owner

Sorry, can you run that again, and include "--debug debug2.out" in the PV options (with the strace)? From the strace I can see that PV did not attempt to write to stderr at all after reading all the input, so I'd like to see whether the debug info gives any reason for that. Thanks.

michaelmior commented

2024-10-09 20:51:06 +02:00

Author

Debug output attached! Thanks for continuing to dig into this :)

debug2.out

247 KiB

a-j-wood commented

2024-10-09 22:17:34 +02:00

Owner

Great, that's really helpful. So I think I see where the problem is, but I don't see why it happened or how to fix it.

In the past, PV would check whether it's in the foreground (not backgrounded with "^Z" "bg") by setting the TOSTOP terminal flag so that if it tried to write to the terminal while backgrounded, it would catch a SIGTTOU signal, and it would just keep trying to write every second or so until it no longer got SIGTTOU. This was not very robust and caused a few weird issues.

With the new version, before doing all that, PV checks whether it's in the foreground by checking that it is in the process group that its output terminal belongs to. If it's not, it suspends terminal output until that check succeeds again. It still does the TOSTOP/SIGTTOU thing to avoid race conditions (such as if it gets backgrounded just after it's done its check but just before it does the write to the terminal).

Anyway, looking at the debug logs, we can see that in the input phase, the check succeeds - PV's process group ID matches the terminal's owning process group ID:

[2024-10-09 18:50:20] (1567017) pv_in_foreground (src/pv/display.c:84): pv_in_foreground: true: our_process_group == tty_process_group

For reasons unknown, it stops being true in the output phase:

[2024-10-09 18:50:21] (1567017) pv_in_foreground (src/pv/display.c:88): pv_in_foreground: false: our_process_group=1567014, tty_process_group=3149893

I guess we could narrow it down a little by changing your command line:

timeout 5 ./pv --debug debug3.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; }

and then "grep process_group debug3.out" to see which parts the number that "echo $$" spits out matches up with, if any.

In the strace you sent this evening I can see similar behaviour. Input phase:

13:56:11.622749 getpgrp()               = 1561723
13:56:11.622761 ioctl(2</dev/pts/49>, TIOCGPGRP, [1561723]) = 0

Output phase - TIOCGPGRP (find out the ID of the foreground process group in the terminal) - shows the foreground process group ID changed:

13:56:12.622983 getpgrp()               = 1561723
13:56:12.622994 ioctl(2</dev/pts/49>, TIOCGPGRP, [3149893]) = 0

I'm guessing that in whichever version of the shell you're using, something odd's going on with process group handling in the pipeline. I'm a bit unsure of how to debug it though. You could try running dash, ash, ksh, or csh, and then re-running the above command within one of those shells, to see if they behave any differently. It would be weird if it was a bash bug, since I've tried this on CentOS 5 through 7, Rocky 8, and Alma 9, and that covers bash 3.2.25 to 5.1.8.

It could be something completely different, not a shell bug. I'm not too familiar with how "sessions" and "process groups" work and how they interact with the idea of a "controlling terminal". It could be that PV is missing an important job control / controlling terminal step I don't know about, not doing something it ought to, and for some reason it hasn't broken on any of my test systems yet.

If you don't find any difference in behaviour when running under other shells, you could also try redirecting stderr in the second part of the pipeline to see if that makes any difference:

timeout 5 ./pv --debug debug4.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; } 2>/dev/null

But really I'm grasping at straws there.

Great, that's really helpful. So I think I see where the problem is, but I don't see why it happened or how to fix it. In the past, PV would check whether it's in the foreground (not backgrounded with "^Z" "bg") by setting the TOSTOP terminal flag so that if it tried to write to the terminal while backgrounded, it would catch a SIGTTOU signal, and it would just keep trying to write every second or so until it no longer got SIGTTOU. This was not very robust and caused a few weird issues. With the new version, before doing all that, PV checks whether it's in the foreground by checking that it is in the process group that its output terminal belongs to. If it's not, it suspends terminal output until that check succeeds again. It still does the TOSTOP/SIGTTOU thing to avoid race conditions (such as if it gets backgrounded just after it's done its check but just before it does the write to the terminal). Anyway, looking at the debug logs, we can see that in the input phase, the check succeeds - PV's process group ID matches the terminal's owning process group ID: [2024-10-09 18:50:20] (1567017) pv_in_foreground (src/pv/display.c:84): pv_in_foreground: true: our_process_group == tty_process_group For reasons unknown, it stops being true in the output phase: [2024-10-09 18:50:21] (1567017) pv_in_foreground (src/pv/display.c:88): pv_in_foreground: false: our_process_group=1567014, tty_process_group=3149893 I guess we could narrow it down a little by changing your command line: timeout 5 ./pv --debug debug3.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; } and then "`grep process_group debug3.out`" to see which parts the number that "`echo $$`" spits out matches up with, if any. In the strace you sent this evening I can see similar behaviour. Input phase: 13:56:11.622749 getpgrp() = 1561723 13:56:11.622761 ioctl(2</dev/pts/49>, TIOCGPGRP, [1561723]) = 0 Output phase - TIOCGPGRP (find out the ID of the foreground process group in the terminal) - shows the foreground process group ID changed: 13:56:12.622983 getpgrp() = 1561723 13:56:12.622994 ioctl(2</dev/pts/49>, TIOCGPGRP, [3149893]) = 0 I'm guessing that in whichever version of the shell you're using, something odd's going on with process group handling in the pipeline. I'm a bit unsure of how to debug it though. You could try running dash, ash, ksh, or csh, and then re-running the above command within one of those shells, to see if they behave any differently. It would be weird if it was a bash bug, since I've tried this on CentOS 5 through 7, Rocky 8, and Alma 9, and that covers bash 3.2.25 to 5.1.8. It could be something completely different, not a shell bug. I'm not too familiar with how "sessions" and "process groups" work and how they interact with the idea of a "controlling terminal". It could be that PV is missing an important job control / controlling terminal step I don't know about, not doing something it ought to, and for some reason it hasn't broken on any of my test systems yet. If you don't find any difference in behaviour when running under other shells, you could also try redirecting stderr in the second part of the pipeline to see if that makes any difference: timeout 5 ./pv --debug debug4.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; } 2>/dev/null But really I'm grasping at straws there.

a-j-wood referenced this issue from a commit

2024-10-09 23:08:05 +02:00

Added a possible workaround, and extra debugging, for the missing display output issue in #100, but turned off at the moment.

a-j-wood commented

2024-10-09 23:13:42 +02:00

Owner

Further to this, I've committed some extra debugging and a possible workaround, but it's turned off by default because I'm pretty unsure about it. If you feel able to test it out, it would be a case of downloading the latest code (git clone) and editing srv/pv/display.c to change line 27 so that instead of this:

/* Use tcsetpgrp() workaround - see issue #100 and comments below. */
/* #define USE_TCSETPGRP 1 */

it looks like this:

/* Use tcsetpgrp() workaround - see issue #100 and comments below. */
#define USE_TCSETPGRP 1

Then do the usual build (./configure --enable-debugging; make) or your local equivalent with --static and whatnot.

The changes I've committed are making the assumption that the terminal process group ID has gone to an invalid value in your case because the terminal has "lost" its previous value, and PV will try to check whether the value it's reading is actually a valid process group ID - if it isn't, then it will try setting it, effectively "stealing" control of the terminal back again. It's a bit of a guess, basically.

Further to this, I've committed some extra debugging and a possible workaround, but it's turned off by default because I'm pretty unsure about it. If you feel able to test it out, it would be a case of downloading the latest code (git clone) and editing `srv/pv/display.c` to change line 27 so that instead of this: /* Use tcsetpgrp() workaround - see issue #100 and comments below. */ /* #define USE_TCSETPGRP 1 */ it looks like this: /* Use tcsetpgrp() workaround - see issue #100 and comments below. */ #define USE_TCSETPGRP 1 Then do the usual build (`./configure --enable-debugging; make`) or your local equivalent with `--static` and whatnot. The changes I've committed are making the assumption that the terminal process group ID has gone to an invalid value in your case because the terminal has "lost" its previous value, and PV will try to check whether the value it's reading is actually a valid process group ID - if it isn't, then it will try *setting* it, effectively "stealing" control of the terminal back again. It's a bit of a guess, basically.

michaelmior commented

2024-10-10 17:26:22 +02:00

Author

FWIW, I normally use zsh and I tried under bash and the progress bar does correctly show there. I tried with the fix you suggested under zsh and I'm still getting the same behavior.

Here's the relevant part of the logs

$ timeout 5 ./pv --debug debug4.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; } 2>/dev/null
1670027
  (input): 54.0KiB 0:00:00 [ 470MiB/s] [===================================================================================================================================================================================>] 100%
$ grep process_group debug4.out
[2024-10-10 15:25:41] (1683091) pv_in_foreground (src/pv/display.c:87): true: our_process_group == tty_process_group (1683090)
[2024-10-10 15:25:42] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist)
[2024-10-10 15:25:43] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist)
[2024-10-10 15:25:44] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist)
[2024-10-10 15:25:45] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist)

FWIW, I normally use zsh and I tried under bash and the progress bar does correctly show there. I tried with the fix you suggested under zsh and I'm still getting the same behavior. Here's the relevant part of the logs ``` $ timeout 5 ./pv --debug debug4.out -C -U - Makefile | { echo $$; while read -r line; do sleep 0.003; done; } 2>/dev/null 1670027 (input): 54.0KiB 0:00:00 [ 470MiB/s] [===================================================================================================================================================================================>] 100% $ grep process_group debug4.out [2024-10-10 15:25:41] (1683091) pv_in_foreground (src/pv/display.c:87): true: our_process_group == tty_process_group (1683090) [2024-10-10 15:25:42] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist) [2024-10-10 15:25:43] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist) [2024-10-10 15:25:44] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist) [2024-10-10 15:25:45] (1683091) pv_in_foreground (src/pv/display.c:102): false: our_process_group=1683090, tty_process_group=(1670027 and confirmed to exist) ```

a-j-wood commented

2024-10-10 22:31:13 +02:00

Owner

Thanks for that. I think you've found the missing puzzle piece - zsh. I've just tried the above command line under zsh, and PV behaves properly on Debian 12 - that's zsh 5.9 - but misbehaves on Debian 11 - zsh 5.8. (It behaves on CentOS 7 - zsh 5.0.2 - which is interesting).

Which version of zsh ("zsh --version") do you have?

Thanks for that. I think you've found the missing puzzle piece - zsh. I've just tried the above command line under zsh, and PV behaves properly on Debian 12 - that's zsh 5.9 - but misbehaves on Debian 11 - zsh 5.8. (It behaves on CentOS 7 - zsh 5.0.2 - which is interesting). Which version of zsh ("`zsh --version`") do you have?

michaelmior commented

2024-10-10 22:40:22 +02:00

Author

zsh 5.8 (x86_64-ubuntu-linux-gnu) That seems to be the problem :)

`zsh 5.8 (x86_64-ubuntu-linux-gnu)` That seems to be the problem :)

a-j-wood commented

2024-10-10 23:36:55 +02:00

Owner

The good news is I can now reproduce this myself and see what you see.

Simplifying the tests a bit, I see no output like this, just a pause while the reads complete, under zsh 5.8:

% seq 1 10 | ./pv | { while read -r line; do sleep 0.2; done; }

If I add a sleep before the loop starts, I see a progress bar with no progress, which sometimes disappears at the end and sometimes doesn't, suggesting that the terminal process group gets changed on the first read, not immediately:

% seq 1 10 | ./pv | { sleep 2; while read -r line; do sleep 0.2; done; }

The zsh release notes for 5.8.1 -> 5.9 mention something that might be relevant:

emulate sh: When zsh emulates sh, the final command in a pipeline is now run in a subshell. This differs from the behavior in the native (zsh) mode, but is consistent with most other sh implementations.

So I tried changing the curly brackets to rounded brackets, to explicitly ask for a subshell, and got a progress bar:

% seq 1 10 | ./pv | ( while read -r line; do sleep 0.2; done; )
21.0  B 0:00:01 [11.4  B/s] [  <=>                                                    ]

And if I run the whole thing with "-c" rather than interactively, it works:

$ zsh -c 'seq 1 10 | ./pv | { while read -r line; do sleep 0.2; done; }'
21.0  B 0:00:01 [11.4  B/s] [  <=>                                                    ]

Running the whole thing under strace, I can see zsh setting the terminal process group when it arguably shouldn't. For now I'll close this issue since the feature has been added, and have raised issue #105 to cover the zsh problems.

I'm afraid this means that the workaround for you is to not use zsh in this particular case.

The good news is I can now reproduce this myself and see what you see. Simplifying the tests a bit, I see no output like this, just a pause while the reads complete, under zsh 5.8: % seq 1 10 | ./pv | { while read -r line; do sleep 0.2; done; } If I add a sleep before the loop starts, I see a progress bar with no progress, which sometimes disappears at the end and sometimes doesn't, suggesting that the terminal process group gets changed on the first read, not immediately: % seq 1 10 | ./pv | { sleep 2; while read -r line; do sleep 0.2; done; } The zsh release notes for 5.8.1 -> 5.9 mention something that might be relevant: * *emulate sh: When zsh emulates sh, the final command in a pipeline is now run in a subshell. This differs from the behavior in the native (zsh) mode, but is consistent with most other sh implementations.* So I tried changing the curly brackets to rounded brackets, to explicitly ask for a subshell, and got a progress bar: % seq 1 10 | ./pv | ( while read -r line; do sleep 0.2; done; ) 21.0 B 0:00:01 [11.4 B/s] [ <=> ] And if I run the whole thing with "-c" rather than interactively, it works: $ zsh -c 'seq 1 10 | ./pv | { while read -r line; do sleep 0.2; done; }' 21.0 B 0:00:01 [11.4 B/s] [ <=> ] Running the whole thing under strace, I can see zsh setting the terminal process group when it arguably shouldn't. For now I'll close this issue since the feature has been added, and have raised issue #105 to cover the zsh problems. I'm afraid this means that the workaround for you is to not use zsh in this particular case.

a-j-wood closed this issue

2024-10-10 23:36:57 +02:00

michaelmior commented

2024-10-11 18:02:44 +02:00

Author

@a-j-wood Thanks for spending so much time debugging. Upgrading zsh seems like a reasonable solution or of course just explicitly asking for a subshell as in your example. Either way, appreciate your help!

Rows
Columns

Feature request: Allow buffering all input to calculate total size before proceeding #100