x509: certificate signed by unknown authority? Maybe the cert pool is empty
I recently worked on getting amazon-ssm-agent to run inside containers on Bottlerocket. During that process, I ran into a TLS issue connecting to amazonaws.com. The root cause turned out be interesting and we'll walk through it in this post.
Running amazon-ssm-agent in a container: why and how?
To enable sessions between a container and the outside world, we followed the same approach as the ECS Execute-Command proposal. The idea is to prepare a directory on the host that contains all the files required by SSM, then bind mount that directory into the container. The shared directory is referred to as "exec-deps" in the diagram below and the rest of this post.
1 <-HOST->
2
3 aws ecs run-task ------------------|
4 V
5 _____________
6 | |
7 ___customer_task___________ |---------------------| ECS Agent |
8 | _______________________ | |(1.mount) | |
9 | | customer container | | V -------------
10 | | /exec-deps/ *<---|-|--/SSM Agent (Exec Agent)
11 | | *<---|-|--/session-worker & logger
12 | | *<---|-|--/certs
13 | | *<---|-|--/configuration
14 | | | |
15 | |/var/log/amazon/ssm/-|-|->/var/log/ecs/execAgent/<containerID>/
16 | ----------------------- |
17 ---------------------------
But copying SSM binaries doesn't work on Bottlerocket.
Bottlerocket is a Linux-based operating system optimized for hosting containers and it enforces SELinux. That lead to the following SELinux denial.
1kernel: audit: type=1400 audit(1739175174.799:6): avc: denied { entrypoint } for pid=53503 comm="runc:[2:INIT]" path="/exec-deps/amazon-ssm-agent"
2
3dev="nvme0n1p8" ino=202 scontext=system_u:system_r:container_t:s0 tcontext=system_u:object_r:local_t:s0 tclass=file permissive=0
The denial shows that, the container’s init process (labeled container_t) is not allowed to execute the amazon-ssm-agent binary which is labeled as local_t. In Bottlerocket, the root filesystem is read-only and there is no package manager to install files. So we put amazon-ssm-agent in the root filesystem, which labels the amazon-ssm-agent binary as os_t. Processes running as container_t can execute binaries labeled os_t. That gives us a cleaner solution. Instead of copying the binaries into a folder labeled as local_t , we can just bind mount the SSM binaries from the host into the container as read-only. In my case, direct bind mount also works for the cert file used by SSM because the cert file is the same as the default crt file from the host. I ended up with the following bind mounts in OCI runtime spec.
1# similar bind mount for other SSM binaries: ssm-agent-worker and ssm-session-worker
2{
3 "destination": "/exec-deps/amazon-ssm-agent",
4 "type": "bind",
5 "source": "/var/lib/exec-deps/amazon-ssm-agent",
6 "options": [ "bind", "ro" ]
7},
8...
9# bind mount default CA cert from the host.
10{
11 "destination": "/exec-deps/certs/amazon-ssm-agent.crt",
12 "type": "bind",
13 "source": "/etc/pki/tls/certs/ca-bundle.crt",
14 "options": [ "bind", "ro" ]
15},
16...
Why doesn't bind-mounting the host cert always work?
During testing, I found that SSM sessions worked fine for containers based on full operating systems like like Amazon Linux. But they failed for minimal images like scratch and busybox. The log showed this error.
12025-04-13 22:05:33 ERROR [SetWebSocket @ controlchannel.go.92] [ssm-agent-worker] [MessageService] [MGSInteractor]
2Failed to get controlchannel token, error: CreateControlChannel failed with error: createControlChannel request failed: failed to make http client call:
3Post "https://ssmmessages.us-west-2.amazonaws.com/v1/control-channel":
4tls: failed to verify certificate: x509: certificate signed by unknown authority
That's interesting - especially since I confirmed that the certificate file was correctly bind-mounted, and could be used to verify the server manually:
1sudo ctr task exec --exec-id temp-1 ${CONTAINER_ID} \
2 /verify-tls https://ssmmessages.us-west-2.amazonaws.com:443 /exec-deps/certs/amazon-ssm-agent.crt
3
4Successfully connected to https://ssmmessages.us-west-2.amazonaws.com:443
So if the certificate file is valid and accessible, what went wrong? In containers based on full OS images, system-wide CA certificates are usually bundled in well-known locations and used automatically by TLS clients. But in minimalist images like scratch or busybox, those defaults are missing. If ssm-agent-worker fails to load the cert file, it ends up making a TLS request without any trusted roots. That leads to the classic: "x509: certificate signed by unknown authority".
Where does SSM read certs from?
SSM supports running inside a container when "ContainerMode": true is set. In this mode, SSM reads certificates from two sources:
- Default system locations, via Go's x509.SystemCertPool(). This is why SSM works out of the box in containers based on Amazon Linux — those images come with pre-installed CA certs that can validate amazonaws.com.
- A custom certificate file located at: {parent-folder-of-ssm-binaries}/certs/amazon-ssm-agent.crt.
But there's a strict requirement on custom certificate: SSM will only load it if the crt file is owned by root and its permissions are exactly 0400 (readable by owner only). You can see this in certreader_unix.go.
1// Get folder resource information
2folderSys := getSys(folderStat)
3if folderSys.(*syscall.Stat_t).Uid != 0 ||
4 folderSys.(*syscall.Stat_t).Gid != 0 {
5 return nil, fmt.Errorf("Certificate folder is not owned by root")
6}
7
8// Check if certificate has read only permission
9if getPerm(fileStat) != 0400 {
10 return nil, fmt.Errorf("Certificate does not have only owner read permission: %d", uint32(getPerm(fileStat)))
11}
Now, here’s the gotcha. On Bottlerocket, the default CA cert file (/etc/pki/tls/certs/ca-bundle.crt) is owned by root but has mode 0444 (r--r--r--). Because Bottlerocket’s root filesystem is read-only, we can’t change that in place — and bind mounting keeps the file mode unchanged:
1ls -lh /exec-deps/certs/amazon-ssm-agent.crt
2-r--r--r-- 1 root root 219.2K Apr 8 02:02 /exec-deps/certs/amazon-ssm-agent.crt
To satisfy SSM’s strict permission check, we need to:
- Copy the host cert somewhere writable (/var/lib/exec-deps/certs/),
- Set its mode to 0400, and
- Bind mount that copy into the container:
1 {
2 "destination": "/exec-deps/certs/amazon-ssm-agent.crt",
3 "type": "bind",
4 "source": "/var/lib/exec-deps/certs/ca-bundle.crt", <-- this a copy with perm 0400
5 "options": [ "bind", "ro" ]
6 },
But... why does SSM enforce 0400?
Maybe enforcing 0400 is to avoid accidental exposure of private keys in .crt files. But, I think this protection is weak and unnecessary. If the concern is exposing private keys, then check explicitly whether the file contains a private key. Also, consider this: If a container and its processes is running as root (i.e., Spec.process.uid: 0 in OCI), and the file is owned by root, then it doesn’t matter if the file is 0400 or 0444 — the container can read it either way. In that case, enforcing 0400 doesn’t really add security — it just creates friction.
Conclusion
- The error "x509: certificate signed by unknown authority" can sometimes mean that the certificate pool is simply empty. Unfortunately, in Go, there’s no straightforward way to tell whether the pool is empty — so you just get a vague error with little context.
- When in doubt, trace the code that builds the certificate pool — especially when dealing with .crt files in containerized environments. This is because some images don't come with default certs. In such case, we need to prepare crt file on the host and mount it into containers.
- As for SSM’s requirement that a cert file must be 0400 and owned by root, it doesn't really enhance security because the processes in the container must be running as root.