Spawning a New Process for Socket-Activated Daemons is Error-Prone
I recently debugged a mysterious latency issue: after migrating a systemd service from path-activation to socket-activation, there was a consistent ~1 second time-to-available latency. The culprit was a bad practice—starting the daemon program as a new process in socket-activation. Let's dive into the details.
Starting soci-snapshotter On-Demand Using Socket Activation
soci-snapshotter is an open-source containerd snapshotter plugin that enables pulling OCI images
in lazy loading mode
using SOCI indexes.
In our container runtime, we start soci-snapshotter only when an image has a SOCI index. Previously, this on-demand service start
used path activation. Once the runtime
found a SOCI index, it would write a sentinel file to disk, triggering systemd to start the soci-snapshotter service. About a year ago,
soci-snapshotter added support for systemd socket activation
by listening on a file descriptor passed via the --address flag.
Systemd creates and listens on sockets before the service starts. When something connects to the socket, systemd launches the service and hands over the already-open file descriptors.
At a high level:
- Given the following socket unit file
soci-snapshotter.socket, systemd creates and listens to/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock.
1[Unit]
2Description=Monitor soci snapshotter socket file for changes and start snapshotter
3
4[Socket]
5ListenStream=/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock
6SocketMode=0660
7
8[Install]
9WantedBy=sockets.target
- Register the soci plugin in containerd.
1[proxy_plugins]
2[proxy_plugins.soci]
3type = "snapshot"
4address = "/run/soci-snapshotter-grpc/soci-snapshotter-grpc.sock"
- The container runtime finds a SOCI index and asks the containerd client to use the soci-snapshotter plugin to pull the image.
- The containerd client sends a gRPC request to the socket.
- Systemd receives the connection from the socket, starts the service of the same name (
soci-snapshotter.service), and passes file descriptors to the main process started by the service.
Mysterious One-Second Latency
Soon after deployment, we observed approximately 1 second of additional image pull latency. Enabling debug log level for containerd showed the following:
12025-11-10T07:46:14.609236015Z level=debug msg="create image" name="xxx.dkr.ecr.us-west-2.amazonaws.com/telemetry:latest@sha256:xxx" target="sha256:xxx""
22025-11-10T07:46:15.174746106Z level=debug msg="prepare view snapshot" key=ping parent= snapshotter=soci"
32025-11-10T07:46:16.177995366Z level=debug msg="prepare view snapshot" key=ping parent= snapshotter=soci"
42025-11-10T07:46:16.184813541Z level=debug msg="prepare view snapshot" key=ping parent= snapshotter=soci"
52025-11-10T07:46:16.194634746Z level=debug msg="prepare view snapshot" key=ping parent= snapshotter=soci"
6... retry every 10ms ...
72025-11-10T07:46:17.184668850Z level=debug msg="prepare view snapshot" key=ping parent= snapshotter=soci"
82025-11-10T07:46:17.274659068Z level=debug msg="(*service).Write started" expected="sha256:xxx" ref="manifest-sha256:xxx" total=1577"
The "ping" is a Snapshotter.View call with an empty parent argument.
After about 100 retries, the ping finally succeeds, indicating soci-snapshotter is ready.
Interestingly, soci-snapshotter only saw the last ping request—the first request it received after starting, more than 1 second after launch.
12025-11-10T07:46:15.560288947Z "address":"fd://","level":"info","msg":"soci-snapshotter-grpc successfully started"
22025-11-10T07:46:17.187325450Z "key":"fargate.task/103/ping","level":"debug","msg":"view"
Where Did the Failed Ping Requests Go?
Note that the daemon software configured for socket activation with socket units needs to be able to accept sockets from systemd, either via systemd's native socket passing interface (see sd_listen_fds for details about the precise protocol used and the order in which the file descriptors are passed) or via traditional inetd(8)-style socket passing (i.e., sockets passed in via standard input and output, using
StandardInput=socketin the service file). --- systemd socket activation documentation
The sd_listen_fds function does the following:
- Sets environment variables:
LISTEN_FDS(number of file descriptors passed) andLISTEN_PID(PID of your process so you can ignore fds meant for others) - Passes the file descriptors starting at FD 3
However, the environment variables and file descriptors are passed to the main process started by soci-snapshotter.service—a Go program called start-soci-snapshotter.
This Go program first performs some preparation work, then spawns a new process to actually run soci-snapshotter.
1func main() {
2 doSomePreparationWorks()
3 cmd := exec.Command("/usr/bin/soci-snapshotter-grpc", "--address=fd://", "--config=/etc/soci-snapshotter-grpc/config.toml")
4 cmd.Run()
5}
The environment variables and file descriptors are passed to the main process, but exec.Command uses fork() and execve() and does not pass the parent process's environment variables and file descriptors by default. You must explicitly set cmd.ExtraFiles and cmd.Env on the command object.
When soci-snapshotter starts in the spawned process, it doesn't find the expected file descriptor, so it removes the socket and rebinds to it. See soci's listenUnix:
1func listenUnix(addr string) (net.Listener, error) {
2 // Try to remove the socket file to avoid EADDRINUSE
3 if err := os.RemoveAll(addr); err != nil {
4 return nil, fmt.Errorf("failed to remove %q: %w", addr, err)
5 }
6 return net.Listen("unix", addr)
7}
Ping requests never reach soci-snapshotter because it deletes the socket systemd created and recreates a new socket at the same path. This explains the retries, but why does the ping request succeed after 1 second?
Containerd uses the default gRPC backoff of 1s with 0.2s jitter. After the 1s backoff, containerd re-dials and connects to the socket that soci-snapshotter now listens on.
Conclusion
The fix is simple. Instead of exec.Command, use unix.Exec, which replaces the current process with a new program using the execve system call.
Spawning a new process in a socket-activated daemon is rarely a good idea. If you must do it, ensure environment variables and file descriptors are passed to the spawned process.