Bash Tips & Tricks: Handling failures in pipe
If you’re using bash, you probably know that pipes are really nice and helpful. Recently I wasted a lot of time trying to realize why one of our tests failed on data corruption (we are kind-of a storage company so this is bad) and the results were amusing (or sad, you can decide).
The test is simple:
- Create a random file and calculate it’s MD5.
- Make some nasty stuff to our system.
- Read the file, recalculate it’s MD5 and make sure they are the same.
Well, they weren’t.
The first part of the test is done using the following command:
1 2 3 4 5 | [21:00 alexander ~/tmp ]$ head -c 10485760 < /dev/urandom | tee outputfile | md5sum | awk '{print $1}' fe187b9c6de9403eb493f5bcfc1e1c2c [21:00 alexander ~/tmp ]$ echo $? 0 |
The verification part is done later on using the following command:
1 2 | [21:00 alexander ~/tmp ]$ md5sum outputfile | awk '{print $1}' fe187b9c6de9403eb493f5bcfc1e1c2c |
As you can see, in this example, everything works fine.
Now, let’s simulate an error in the middle of the pipe by using a ram drive that is smaller than our write:
1 2 3 4 5 6 7 8 9 10 | [21:01 alexander ~/tmp ]$ mkdir storage [21:01 alexander ~/tmp ]$ sudo mount -t tmpfs -o size=5m tmpfs storage [21:01 alexander ~/tmp ]$ head -c 10485760 < /dev/urandom | tee storage/outputfile | md5sum | awk '{print $1}' tee: storage/outputfile: No space left on device ce034de6ca47a9e63c8d2fb48de89c86 [21:01 alexander ~/tmp ]$ echo $? 0 [21:01 alexander ~/tmp ]$ md5sum storage/outputfile | awk '{print $1}' 59c6ddea9d614f1fb0df9d145857b246 [21:01 alexander ~/tmp ]$ sudo umount storage |
As you can see, the first command wrote to stderr: “No space left on device” but the result of the command is 0. Then we continue the regular flow and see that the file signature is different.
This happens due to the fact that the result of the command is determined by the last ran sub-command, means that the $? variable will be set to the result of “awk”.
Bash gives us the option to “drag” the last failure to the result of the pipe by using the “pipefail” option:
Each command in a pipeline is executed in its own subshell (see Command Execution Environment). The exit status of a pipeline is the exit status of the last command in the pipeline, unless the pipefail option is enabled (see The Set Builtin). If pipefail is enabled, the pipeline’s return status is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands exit successfully. If the reserved word ‘!’ precedes the pipeline, the exit status is the logical negation of the exit status as described above. The shell waits for all commands in the pipeline to terminate before returning a value.
Here is a simple example (note that the pipe is ran in a sub-shell so the pipefail won’t affect the current shell):
1 2 3 4 5 6 | [21:02 alexander ~/tmp ]$ false | true ; echo $? 0 [21:02 alexander ~/tmp ]$ (set -o pipefail && false | true ; echo $?) 1 [21:02 alexander ~/tmp ]$ false | true ; echo $? 0 |
Now, if we’ll go back to the original code snippet and use the flag there:
1 2 3 4 5 6 7 | [21:03 alexander ~/tmp ]$ mkdir storage [21:03 alexander ~/tmp ]$ sudo mount -t tmpfs -o size=5m tmpfs storage [21:03 alexander ~/tmp ]$ (set -o pipefail && head -c 10485760 < /dev/urandom | tee storage/outputfile | md5sum | awk '{print $1}') tee: storage/outputfile: No space left on device ce034de6ca47a9e63c8d2fb48de89c86 [21:03 alexander ~/tmp ]$ echo $? 1 |
So we will fail the test on the “out of disk space” issue instead of looking for corruptions (or unicorns)…
– Alexander