Practical Engineering
open-menu closeme
Engineering
github linkedin rss
  • Spawning a New Process for Socket-Activated Daemons is Error-Prone

    calendar Dec 10, 2025 · 4 min read · Linux systemd Container  ·
    Share on: twitter copy

    I recently debugged a mysterious latency issue: after migrating a systemd service from path-activation to socket-activation, there was a consistent ~1 second time-to-available latency. The culprit was a bad practice—starting the daemon program as a new process in socket-activation. Let's dive into the details. Starting …


    Read More
  • Be careful making thread-aware syscalls in Go: lock the thread

    calendar Oct 20, 2025 · 10 min read · Go Container Linux  ·
    Share on: twitter copy

    A bug caused around 0.5% of container workloads to fail to start for one internal customer. This post walks through the bug and its fix—an interesting mix of Linux namespaces, Go concurrency, and syscalls. The need to run a program in its own network namespace and mount namespace soci-snapshotter is an open-source …


    Read More
  • Mysterious Image Pull Failures: "401 Unauthorized" and "Not Found" After Migrating Containerd to v2

    calendar Oct 12, 2025 · 7 min read · container AWS  ·
    Share on: twitter copy

    Early this year, we migrated containerd from v1.7 to v2.0.5. However, we quickly noticed image pulls from Amazon Elastic Container Registry (ECR) began failing for both public and private ECR repositories. For example: 1# public ECR 2FATA[0031] failed to resolve reference …


    Read More
  • x509: certificate signed by unknown authority? Maybe the cert pool is empty

    calendar Apr 15, 2025 · 6 min read · Linux Container SELinux Bottlerocket  ·
    Share on: twitter copy

    I recently worked on getting amazon-ssm-agent to run inside containers on Bottlerocket. During that process, I ran into a TLS issue connecting to amazonaws.com. The root cause turned out be interesting and we'll walk through it in this post. Running amazon-ssm-agent in a container: why and how? To enable sessions …


    Read More
  • Missing Container Disk I/O Stats with cgroup v1 on Kernel 6.1

    calendar Nov 9, 2024 · 4 min read · Linux Container  ·
    Share on: twitter copy

    As Amazon Linux 2 (AL2) approaches its End of Life on June 30, 2025, we have started migrating our container platform from AL2 to Bottlerocket. The migration encountered a few speed bumps. In this post, we'll examine one of them: missing container disk I/O stats. Why are container I/O dashboards blank? Since …


    Read More

Peng Zhang

Software Engineer

Recent Posts

  • Simplify device path on boot with udev
  • Use KillMode=process with caution: restart loop could deplete resources
  • Spawning a New Process for Socket-Activated Daemons is Error-Prone
  • Be careful making thread-aware syscalls in Go: lock the thread
  • Speed up building Bottlerocket image in AWS CodeBuild
  • Mysterious Image Pull Failures: "401 Unauthorized" and "Not Found" After Migrating Containerd to v2
  • EC2 IMDS is Unstable During Early Boot: Always Retry
  • Who Modified My Program in Bottlerocket?

Tags

LINUX 19 GO 17 ALGORITHMS 8 BOTTLEROCKET 7 INTERVIEW 7 CONTAINER 5 GUIDE 3 DISTRIBUTED-SYSTEM 2 SELINUX 2 SYSTEMD 2 WEB 2 AWS 1 COMPUTER-ARCHITECTURE 1 CONCURRENCY 1
All Tags
ALGORITHMS8 AWS1 BOTTLEROCKET7 COMPUTER-ARCHITECTURE1 CONCURRENCY1 CONTAINER5 CRYPTOGRAPHY1 DATABASES1 DISTRIBUTED-SYSTEM2 DOCKER1 EC21 GO17 GUIDE3 INTERVIEW7 LINUX19 SELINUX2 SHELL1 SYSTEMD2 TESTING1 WEB2
[A~Z][0~9]
Peng Zhang

Copyright 2022-  PENG ZHANG. All Rights Reserved

to-top