open-menu closeme
Engineering
github linkedin rss
  • Be careful making thread-aware syscalls in Go: lock the thread

    calendar Oct 20, 2025 · 10 min read · Go Container Linux  ·
    Share on: twitter copy

    A bug caused around 0.5% of container workloads to fail to start during load test. This post walks through the bug and its fix, an interesting mix of Linux namespaces, Go concurrency, and syscalls. The need to run a program in its own network namespace and mount namespace soci-snapshotter is an open-source containerd …


    Read More
  • Speed up building Bottlerocket image in AWS CodeBuild

    calendar Oct 20, 2025 · 4 min read · Bottlerocket Docker  ·
    Share on: twitter copy

    When I first moved building Bottlerocket AMI from an EC2 host to AWS CodeBuild, I was hit by a very slow build. On an EC2 instance, I built both the x86 and Arm versions on x86 instances, and fresh builds finished in 5 minutes. However, on CodeBuild with more vCPU and memory, the build process was painfully slower. The …


    Read More
  • Mysterious Image Pull Failures: "401 Unauthorized" and "Not Found" After Migrating Containerd to v2

    calendar Oct 12, 2025 · 7 min read · container AWS  ·
    Share on: twitter copy

    Early this year, we migrated containerd from v1.7 to v2.0.5. However, we quickly noticed image pulls from Amazon Elastic Container Registry (ECR) began failing for both public and private ECR repositories. For example: 1# public ECR 2FATA[0031] failed to resolve reference …


    Read More
  • EC2 IMDS is Unstable During Early Boot: Always Retry

    calendar Sep 15, 2025 · 2 min read · Linux  ·
    Share on: twitter copy

    In Detect and fix rare cases where the primary ENI does not serve default traffic , we used IMDS "meta-data/mac" to get the primary ENI's MAC address. However, we encountered the following errors in 0.5% of EC2 ARM instance launches: 1failed to get IMDS /mac: operation error ec2imds: GetMetadata, exceeded …


    Read More
  • Who Modified My Program in Bottlerocket?

    calendar Sep 11, 2025 · 2 min read · Linux Bottlerocket  ·
    Share on: twitter copy

    There are a few programs we install in Bottlerocket that cannot be built from source. For these programs, we download the binary from a secure repository and install it using an RPM spec like this: 1# foo.spec 2Name: %{_cross_os}foo 3 4Source0: foo 5 6%install 7install -d %{buildroot}%{_cross_sbindir} 8install -D -p -m …


    Read More
  • Introducing bottlerocket-extra-kit: Essential debugging tools for Bottlerocket

    calendar Sep 1, 2025 · 1 min read · Linux Bottlerocket  ·
    Share on: twitter copy

    Bottlerocket is a Linux-based operating system optimized for hosting containers. We use Bottlerocket to run millions of containers each day. There are three key differences between Bottlerocket and common Linux distributions like Amazon Linux 2023: The rootfs is read-only. There is no package manager (e.g., yum) in …


    Read More
  • Tips for Building Bottlerocket AMIs

    calendar Aug 20, 2025 · 6 min read · Linux Bottlerocket  ·
    Share on: twitter copy

    Bottlerocket is a Linux-based operating system optimized for hosting containers. At my work, we migrated from Amazon Linux to Bottlerocket and experienced the following benefits: Developer-friendly: Easy to understand and fast to build. RPM spec and configuration TOML files are all you need. Every developer can build a …


    Read More
  • Working Knowledge of Linux Memory: Concepts

    calendar Aug 4, 2025 · 6 min read · Linux  ·
    Share on: twitter copy

    I recently dealt with a server livelock issue caused by memory page thrashing. This post refreshes the Linux memory basics I found useful for debugging the issue. Much of the content is from Chapter 7 of Systems Performance: Enterprise and the Cloud. Virtual Memory Virtual memory is an abstraction that provides each …


    Read More
  • Detect and fix rare cases where the primary ENI does not serve default traffic

    calendar Jul 27, 2025 · 3 min read · Linux EC2  ·
    Share on: twitter copy

    During testing, we encountered a rare scenario when launching EC2 instances with multiple ENIs: the primary ENI (device index 0) does not serve default network traffic. This occurs in approximately 1 out of 10,000 launches (0.01%). For example, when configuring two ENIs on an instance—ENI-0 (deviceIndex=0) from …


    Read More
  • SELinux Concepts

    calendar Jun 15, 2025 · 5 min read · Linux SELinux  ·
    Share on: twitter copy

    Security-Enhanced Linux (SELinux) is a mandatory access control (MAC) system that enhances Linux security. "Mandatory" means access control is strictly enforced by predefined policy rules—users and processes cannot modify these rules at will, ensuring security is not left to individual discretion. SELinux is …


    Read More
    • ««
    • «
    • 1
    • 2
    • 3
    • 4
    • 5
    • »
    • »»

Peng Zhang

Software Engineer

Recent Posts

  • How cgo silently disables Go's deadlock detector
  • Replacing socat with systemd-socket-proxyd
  • You probably want to disable cgo: Go's stdlib has pure-Go implementations
  • Who killed my service: collecting kernel kill logs with OTEL
  • Avoid using 2D map for transition table in Go
  • cached-imds-client: cache static IMDS responses to avoid linklocal_allowance_exceeded on EC2
  • Using WaitGroup to Track Work Items, Not Workers: A Multi-threaded BFS Example
  • Simplify device path on boot with udev

Tags

GO 21 LINUX 20 ALGORITHMS 8 BOTTLEROCKET 7 INTERVIEW 7 CONTAINER 5 CONCURRENCY 3 GUIDE 3 SYSTEMD 3 AWS 2 DISTRIBUTED SYSTEM 2 SELINUX 2 WEB 2 CRYPTOGRAPHY 1
All Tags
ALGORITHMS8 AWS2 BOTTLEROCKET7 CONCURRENCY3 CONTAINER5 CRYPTOGRAPHY1 DATABASES1 DISTRIBUTED SYSTEM2 DOCKER1 EC21 GO21 GUIDE3 INTERVIEW7 LINUX20 SELINUX2 SHELL1 SYSTEMD3 TESTING1 WEB2
[A~Z][0~9]
Peng Zhang

Copyright 2022-  PENG ZHANG. All Rights Reserved

to-top