A Few Shell Surprises

Shell scripts are infamous for security issues and surprising behavior. So when possible, it's better to avoid using shell. For instance, we built a container platform using the Bottlerocket OS and we didn't even install shell. If someone needs to run shell, the shell must be run inside a container. That said, shell is still handy for ad hoc scripting. In this post, I'll share a few surprising behaviors I ran into recently.

Silent Failures in Command Substitution

When testing our container platform, we launched containers that ran commands and verified the exit code. For example, we had a test that checked whether exactly one process named foo was running:

1FOO_PID=$(pgrep -c foo); 
2if [ ${FOO_PID} -ne 1 ]; then 
3  exit 1; 
4fi

This test passed consistently—until one day we manually checked and found no foo process running in the container. Oops.

Here's a simplified demo using Docker:

 1% docker run --name check-foo -it busybox /bin/sh -c 'FOO_PID=$(pgrep -c foo); if [ ${FOO_PID} -ne 1 ]; then exit 1; fi'
 2pgrep: invalid option -- 'c'
 3BusyBox v1.37.0 (2024-09-26 21:31:42 UTC) multi-call binary.
 4
 5Usage: pgrep [-flanovx] [-s SID|-P PPID|PATTERN]
 6
 7Display process(es) selected by regex PATTERN
 8
 9        -l      Show command name too
10        -a      Show command line too
11        -f      Match against entire command line
12        -n      Show the newest process only
13        -o      Show the oldest process only
14        -v      Negate the match
15        -x      Match whole name (not substring)
16        -s      Match session ID (0 for current)
17        -P      Match parent process ID
18sh: 1: unknown operand
19
20% docker inspect check-foo --format='{{.State.ExitCode}}'
210

So even though pgrep -c failed (invalid option in BusyBox), the script exited with code 0. Why? Because command substitution ($(...)) runs in a subshell, and its failure doesn't cause the outer shell to exit. Plus, since pgrep didn’t output anything, FOO_PID became an empty string, causing conditional check [ -ne 1 ] to fail.

To fix the command, add set -e to the parent shell process to catch error. If your command substitution involves multiple commands, also use set -e inside the substitution:

1x=$(set -e; foo; bar)

Unexpected Exit Code 141 (SIGPIPE)

Here’s another gotcha involving two scripts. First, a.sh prints a Unix timestamp and its human-readable form:

1#!/bin/bash
2
3# a.sh
4set -euo pipefail
5
6TIMESTAMP=$(date +%s)
7echo ${TIMESTAMP}
8echo $(date -u -d @$(echo "${TIMESTAMP} " | bc) +"%Y-%m-%d %H:%M:%S")

Running a.sh outputs something like:

11747029172
22025-05-12 05:52:52

Another script, b.sh captures just the first line of the output of a.sh.

1#!/bin/bash
2
3# b.sh
4set -euo pipefail
5
6x=$(./a.sh | head -n 1)
7echo $x

I expect b.sh to output a Unix time. But it outputs nothing and the exit code is 141. In this case, "head -n 1" exits after reading the first line, which causes a.sh to fail when it tries to write the second line into the now-closed pipe. Because of pipefail, that broken pipe causes the whole substitution to fail.

If early termination in the pipeline is expected, you can turn off pipefail just for the substitution:

1x=$(set +o pipefail; ./a.sh | head -n 1)

"x=0; ((x++))" has exit code 1

Here’s a loop that is expected to run 10 times but it exits unexpectedly.

1#!/bin/bash
2set -e
3
4count=0
5while [ $count -lt 10 ]; do
6  ((count++))
7  echo ${count}
8done

In Bash, the arithmetic evaluation ((...)) behaves like a command: it returns an exit code 0 if the result is non-zero, and 1 if the result is zero. Since count starts at 0, ((count++)) evaluates to 0 and the "set -e" causes the script to exit immediately. A simple fix is to ignore the return value.

1((count++)) || true