Practical Engineering
open-menu closeme
Engineering
github linkedin rss
  • Sharp edges of errgroup: Lessons from an errgroup and Context mishap

    calendar Mar 23, 2025 · 8 min read · Go Concurrency  ·
    Share on: twitter copy

    A recent faulty release disrupted service for some customers. The root cause was a concurrency bug involving x/sync/errgroup and context cancellation. This post shares three practices we learned from the incident. These practices will help us catch similar issues during code review or alert us to problems in …


    Read More
  • Avoid panic on expected errors: lessons from operating journald-to-cwl

    calendar Feb 23, 2025 · 3 min read · Go  ·
    Share on: twitter copy

    We've been using the journald-to-cwl to ship journal logs from EC2 instances to Cloudwatch Logs. It is lightweight and reliable. However, we recently started receiving false positive alarms, which became annoying. This blog covers the changes we made and the key lesson learned: panic on expected errors in Go is …


    Read More
  • GPG is still in use to verify downloads

    calendar Feb 23, 2025 · 2 min read · Linux Cryptography  ·
    Share on: twitter copy

    This week, I needed to install the Amazon SSM Agent and was surprised to find that GPG (GNU Privacy Guard) was the only way to verify the download. I had assumed that software downloads verification had largely transitioned to PKI (Public Key Infrastructure). This short post is a refresh on GPG. OpenPGP is an open …


    Read More
  • Why does GOMEMLIMIT take up significant physical memory for unused virtual memory?

    calendar Jan 19, 2025 · 4 min read · Go Linux  ·
    Share on: twitter copy

    While debugging memory bloat in a Go application recently, I found that removing the GOMEMLIMIT soft memory limit and disabling transparent huge pages partially mitigated the issue. However, I couldn't fully explain why these changes worked. So I thought why not ask the internet about it. A simplified memory bloat …


    Read More
  • Don't Use stderr to Determine Process Failure Because Logs Default to stderr

    calendar Nov 30, 2024 · 6 min read · Go Guide  ·
    Share on: twitter copy

    It's a beautiful day, and it started with a simple code review: 1# tools/foo/main.go 2- fmt.Println("found it") 3+ log.Println("found it") The author explained the advantages of using a logging library over plain printf. The rationale was straightforward, so I approved the change without hesitation. …


    Read More
  • AL2023 vs. AL2: less disk space with ext4?

    calendar Nov 17, 2024 · 7 min read · Linux  ·
    Share on: twitter copy

    We started migrating from Amazon Linux 2 (AL2) to Amazon Linux 2023 (AL2023) a month ago. While testing workloads on AL2023 in the pre-production environment, I noticed slightly higher disk usage compared to the same workload on AL2. In this post, I'll share my investigation. AL2023 has less free disk space with ext4, …


    Read More
  • Ways Go programs die

    calendar Nov 10, 2024 · 4 min read · Go  ·
    Share on: twitter copy

    Our Go programs recently triggered an alarm due to excessive panics. Panic is a Go runtime mechanism that halts execution. It got me thinking about different ways a Go program can die. I don't expect many - not like A Million Ways to Die in the West. In this post, we'll go through the various ways Go programs die. …


    Read More
  • Missing Container Disk I/O Stats with cgroup v1 on Kernel 6.1

    calendar Nov 9, 2024 · 4 min read · Linux Container  ·
    Share on: twitter copy

    As the Amazon Linux 2 (AL2) approaches its End of Life on 2025-06-30, we have started migrating our container platform from AL2 to Amazon Linux 2023 (AL2023). The migration encountered a few speed bumps. In this post, we'll look at one of them: missing container disk I/O stats. Why are container I/O dashboards blank? …


    Read More
  • Mind ordering cycles in systemd: how systemd breaks them can brick the server start up

    calendar Oct 16, 2024 · 3 min read · Linux  ·
    Share on: twitter copy

    I've been building a service for a month and the day finally arrived when I had the artifact - an EC2 AMI. The AMI passed my "rigourous" manual tests, and I felt confident on a Ruby Tuesday, so I launched 100 EC2 instances using the AMI. Surprise! around 28 instances failed to launch. What is going on? All …


    Read More
  • Monotonicity: Find 1-3-2 Pattern

    calendar Oct 14, 2024 · 3 min read · Algorithms Interview  ·
    Share on: twitter copy

    Given an array of numbers A, find out whether it contains a 1-3-2 pattern. An 1-3-2 pattern is a subsequence of three numbers, A[i], A[j] and A[k] such that i<j<k and A[i] < A[k] < A[j]. For clarity, let's call the 1-3-2 pattern the Bronze-Gold-Silver pattern. If A[j] is Gold, then we should consider the …


    Read More
    • ««
    • «
    • 1
    • 2
    • 3
    • 4
    • 5
    • »
    • »»

Peng Zhang

Software Engineer

Recent Posts

  • EC2 IMDS is Unstable During Early Boot: Always Retry
  • Who Modified My Program in Bottlerocket?
  • Introducing bottlerocket-extra-kit: Essential debugging tools for Bottlerocket
  • Tips for Building Bottlerocket AMIs
  • Working Knowledge of Linux Memory: Concepts
  • Detect and fix rare cases where the primary ENI does not serve default traffic
  • SELinux Concepts
  • Modern Go idioms

Tags

GO 16 LINUX 14 ALGORITHMS 8 INTERVIEW 7 BOTTLEROCKET 4 GUIDE 3 CONTAINER 2 DISTRIBUTED-SYSTEM 2 SELINUX 2 WEB 2 COMPUTER-ARCHITECTURE 1 CONCURRENCY 1 CRYPTOGRAPHY 1 DATABASES 1
All Tags
ALGORITHMS8 BOTTLEROCKET4 COMPUTER-ARCHITECTURE1 CONCURRENCY1 CONTAINER2 CRYPTOGRAPHY1 DATABASES1 DISTRIBUTED-SYSTEM2 EC21 GO16 GUIDE3 INTERVIEW7 LINUX14 SELINUX2 SHELL1 TESTING1 WEB2
[A~Z][0~9]
Peng Zhang

Copyright 2022-  PENG ZHANG. All Rights Reserved

to-top