Practical Engineering

Practical Engineering https://peng.fyi/ Recent content on Practical Engineering Hugo -- gohugo.io en Peng Zhang Wed, 16 Oct 2024 00:00:00 +0000 Mind ordering cycles in systemd: how systemd breaks them can brick the server start up https://peng.fyi/post/systemd-cycle-dependencies/ Wed, 16 Oct 2024 00:00:00 +0000 https://peng.fyi/post/systemd-cycle-dependencies/ I've been building a service for a month and the day finally arrived when I had the artifact - an EC2 AMI. The AMI passed my "rigourous" manual tests, and feeling confident on a Ruby Tuesday, I launched 100 EC2 instances using the AMI for performance testing. Surprise! around 28 instances failed to launch. What is going on? The failed instances were all stuck in the "initializing" state and the only way to connect to them was through the EC2 Serial Console. Monotonicity: Find 1-3-2 Pattern https://peng.fyi/post/monotonicity-subinterval-132/ Mon, 14 Oct 2024 00:00:00 +0000 https://peng.fyi/post/monotonicity-subinterval-132/ Given an array of numbers A, find out whether it contains a 1-3-2 pattern. An 1-3-2 pattern is a subsequence of three numbers, A[i], A[j] and A[k] such that i<j<k and A[i] < A[k] < A[j]. For clarity, let's call the 1-3-2 pattern the Bronze-Gold-Silver pattern. If A[j] is Gold, then we should consider the minimum number from A[0:j) to be Bronze, because it gives us the largest range of picking Silver. Hoare Partition, one of the simplest and most beautiful algorithms https://peng.fyi/post/hoare-partition/ Mon, 17 Jun 2024 00:00:00 +0000 https://peng.fyi/post/hoare-partition/ Tony Hoare invented QuickSort in his 1961 paper, QuickSort. At the time of its publication, the best comparison-based sorting algorithm was merge sort. Merge sort divides an unordered array into two equally sized subarrays, sorts each subarray, and then merge the two subarrays to produce a sorted array. Merge sort is simple to understand. However, quick sort is just as simple as merge sort but more elegant. In quicksort, there is no requirement for the two subarrays to be of equal size, and there is no merging step. No More Confusion of Upstream and Downstream https://peng.fyi/post/upstream-and-downstream/ Thu, 07 Mar 2024 00:00:00 +0000 https://peng.fyi/post/upstream-and-downstream/ I often find myself confused by two words in the context of software development: "upstream" and "downstream". They bother me so much that I avoided using them in my own writing and I have to pause whenever I see them. In this blog, I'll show a simple rule that helps remember the difference: downstream adds value to the output of upstream. Downstream adds value to the output of upstream. Let's take a break from software development for a moment and look at the oil industry. False Data Independency: A Look at Cache Line and Write Combining https://peng.fyi/post/false-data-independency-cacheline-and-write-buffer/ Fri, 23 Feb 2024 00:00:00 +0000 https://peng.fyi/post/false-data-independency-cacheline-and-write-buffer/ Modern CPUs operate significantly faster than memories. For instance, a 4.5 GHz x64 CPU operates 30 times faster than a 6000 MHz DDR5 memory of CAS Latency 36. Adding latencies incurred by the bus and memory coherency protocols, CPU access to memory could be 100 times slower than CPU access to registers. To mitigate the speed gap, CPU incorporates layers of caches, typically with a cache line size of 64 bytes. Binary Search in Go standard library https://peng.fyi/post/binary-search-in-go-sdk/ Sun, 11 Feb 2024 00:00:00 +0000 https://peng.fyi/post/binary-search-in-go-sdk/ Given a non-decreasing array and a target value, we can find the target in logarithmic time using binary search. My first programming language is C++ and the C++ Standard Template Library (STL) provides two functions for this task. iterator lower_bound(first, last, value) returns the smallest index with a value larger than or equal to the target. Put another way, if you'd insert the target, the lower_bound is the smallest index to insert. Monotonic Stack: Steps to Make Array Non-decreasing https://peng.fyi/post/steps-to-make-array-non-decreasing/ Sun, 28 Jan 2024 00:00:00 +0000 https://peng.fyi/post/steps-to-make-array-non-decreasing/ Problem Given an integer array A, in one step, remove all elements A[i] where A[i-1] > A[i]. Return the number of steps performed until A becomes a non-decreasing array. See examples at LeetCode 2289. Solution The naive approach executes steps one by one. Store integers in a Linked List. At each step, find all integers that are smaller than its left neighbor and remove them. However, the time complexity is O(n^2) because each step may remove just one integer. Calculate number of nodes in a linear network using message passing https://peng.fyi/post/calculate-linear-network-size-by-message-passing/ Sat, 20 Jan 2024 00:00:00 +0000 https://peng.fyi/post/calculate-linear-network-size-by-message-passing/ Problem In a connected network consisting of N nodes, each node is connected to either one or two neighbors, forming a line topology. The task is to develop a program that runs on each node, calculating the total number of nodes in the network. Each node is aware of its neighboring nodes and can exchange messages with them. It is important to note that nodes do not share memory; the only means of information exchange between two neighbors is through sending and receiving messages. Nested Map: Breakdown analysis of events and return result as nested JSON https://peng.fyi/post/breakdown-analysis-nested-json/ Fri, 19 Jan 2024 00:00:00 +0000 https://peng.fyi/post/breakdown-analysis-nested-json/ Problem An event consists of multiple properties, each defined as a key-value pair, where the key is a string and the value is of a primitive type such as numbers or strings. Importantly, each event must include a mandatory 'Name' property. Given a list of events, the task is to count the number of events based on specified properties. To illustrate, let's consider an example with four events and two analyses Factorial Growth of Subqueries When Using Nested WITH Clauses in ClickHouse https://peng.fyi/post/factorial-growth-of-clickhouse-with-clause/ Fri, 12 Jan 2024 00:00:00 +0000 https://peng.fyi/post/factorial-growth-of-clickhouse-with-clause/ ClickHouse is a popular OLAP database. It speaks SQL and earns the reputation of "fast and resource efficient". But the support of SQL comes with surprises if not careful. In this blog, I show that a simple query of nested WITH clauses in ClickHouse generates factorial number of subqueries. The simple query is short, reads nothing, process nothing and returns nothing. Yet, it uses a lot of CPU and memory to just parse the query. Maglev Hash: Consistent Hash with Guaranteed Even Distribution https://peng.fyi/post/maglev-hash-alternatives-for-ring-hash/ Sun, 07 Jan 2024 00:00:00 +0000 https://peng.fyi/post/maglev-hash-alternatives-for-ring-hash/ In distributed systems, because there are too many requests to be handled by a single server reliably, requests are handled by a cluster of servers. In order to get high availability, the technique of distributing requests to servers needs to satisfy the following three requirements. Even distribution. Each backend take about M/N requests, where M is the number of requests and N is the number of servers. Low disruption. Adding or removing one server causes about M/N requests to be re-distributed. Monotonicity: Sliding Window https://peng.fyi/post/monotonicity-sliding-window/ Mon, 27 Nov 2023 00:00:00 +0000 https://peng.fyi/post/monotonicity-sliding-window/ A function is monotonic if it preserves the order of its arguments, i.e., if $x \le y$, then $f(x) \le f(y) $. In this post, we look at a class of problems where the argument is an interval. By identifying monotonic functions, we can reduce number of intervals to enumerate by an order of magnitude, from $O(n^2)$ to $O(n)$. The algorithm is often known as sliding window, because enumerating intervals is like sliding windows. Monotonicity: Find Largest Subarrays for Each Array Element https://peng.fyi/post/monotonicity-stack/ Thu, 23 Nov 2023 00:00:00 +0000 https://peng.fyi/post/monotonicity-stack/ Given an array of numbers $A$, for each number $A[i]$, find the largest subarray that contains $A[i]$ and $A[i]$ is the minimum of the subarray. For example, for $A=[2, 0, 3, 5, 1, 1, 0, 2 1]$, the largest subarray for $A[4]$ is $[3,5,1,1]$. Let's represent the subarray for $A[i]$ as the left boundary $l[i]$ and right boundary $r[i]$. $$l[i] = \min_{j \le i }\lbrace \forall_{j \le k \le i} A[k] \ge A[i] \rbrace $$ $$r[i] = \max_{j \ge i }\lbrace \forall_{i \le k \le j} A[k] \ge A[i] \rbrace $$ Set GOMAXPROCS for Go programs in containers https://peng.fyi/post/gomaxprocs-in-container/ Wed, 22 Nov 2023 00:00:00 +0000 https://peng.fyi/post/gomaxprocs-in-container/ Every Go program has a runtime. The runtime implements garbage collection, concurrency, stack management, and other critical features. We can configure the runtime by setting variables. In this post, we will look at GOMAXPROCS, a variable that configures concurrency. You may get free performance boost by setting GOMAXPROCS when running Go in containers. What is GOMAXPROCS? The GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. A simple and customizable load test tool in Go https://peng.fyi/post/simple-customizable-loadtest/ Sun, 15 Oct 2023 00:00:00 +0000 https://peng.fyi/post/simple-customizable-loadtest/ We plan to load test our product before public Beta, with two goals in mind. Find out bottlenecks: figure out road maps of performance improvement and prepare for oncalls. Understand how much workload we can support with fixed resources. This shapes the pricing strategy and determines number of Beta partners to onboard, without burning runway. Because it is complicated to generate meaningful loads to the system, we cannot use general tools like K6. Monotonicity: Find the K-th number in two sorted arrays https://peng.fyi/post/kth-number-in-two-sorted-arrays/ Tue, 10 Oct 2023 00:00:00 +0000 https://peng.fyi/post/kth-number-in-two-sorted-arrays/ Given two sorted arrays of integers, $A$ and $B$, find the $K$-th smallest integer from A and B in $O(\min(\log{N}, \log{M}))$ time. $N$ and $M$ are the size of $A$ and $B$ respectively. Without loss of generality, we can assume A and B are of the same length N. Because both A and B are sorted, we can merge A and B into a sorted array C in $O(N)$ and the answer would be C[K-1]. Priority Map: A hash map with access of the minimum value https://peng.fyi/post/priority-map/ Mon, 09 Oct 2023 00:00:00 +0000 https://peng.fyi/post/priority-map/ I implemented a job scheduler at work recently. Each job has a unique ID and an expiration time. The job ID is immutable while the expiration time may change. The job scheduler schedules the job of the earliest expiration time. Go comes with heap.Interface and map. My first implementation uses a hashmap and a slice that implements the heap.Interface. The hashmap maps the Job and the index of the Job in the heap array. Gotchas of defer in Go https://peng.fyi/post/gotchas-of-defer-in-go/ Sun, 10 Sep 2023 00:00:00 +0000 https://peng.fyi/post/gotchas-of-defer-in-go/ A defer statement invokes a function just before the surrounding function returns. Multiple defers within a function are executed in reverse order of their calls, following the Last In First Out (LIFO) principle. Defer is commonly used to ensure resource cleanup, regardless of the function's success or failure. For instance, 1func (h *Handler) Handle(ctx context.Context) { 2 // Use defer to catch and log panics. This prevents web server crash. 3 defer func() { 4 if r := recover(); r ! CORS error with 504 Gateway timeout https://peng.fyi/post/cors-error-with-504/ Fri, 01 Sep 2023 00:00:00 +0000 https://peng.fyi/post/cors-error-with-504/ Like many developers, I often leave browser's "Developer Tools" open for websites of interests. Last week, while playing with our staging service, I saw repeated CORS errors in the Console tab and 504 Gateway Timeouts in the Network tab. It was my first time seeing these two errors together. So I decided to look into it a bit. In this blog, I will reproduce the issue and share some good practices handling CORS. Deep dive in "context canceled" errors on Go web servers https://peng.fyi/post/context-cancel-go-web/ Thu, 29 Jun 2023 00:00:00 +0000 https://peng.fyi/post/context-cancel-go-web/ At iheartjane, we use Go web server to serve Ad requests. After some time in production, we noticed a lot of “context canceled” error logs. The following screenshot of CloudWatch Log Insights query shows the frequency of “context canceled” errors. It left us puzzled about the underlying causes of these context cancels. Should we worry about it? If yes, how should we reduce context cancels. What are “Context Canceled” in Go? Simplicity By Default https://peng.fyi/post/simplicity-by-default/ Tue, 23 May 2023 00:00:00 +0000 https://peng.fyi/post/simplicity-by-default/ I’ve observed quite a few instances where engineers, including myself, have created unnecessarily complex solutions. To cultivate a better R&D organization, it is crucial to choose simplicity by default. In this blog, we’ll look at three don’ts and three do’s that help make simplicity the default. Don’t invent requirements. Engineers solve problems, which are often made complex due to bloated or unclear requirements. Simplify by reducing requirements to a minimal set that satisfies the product. Six Options for Generating Distributed Unique IDs https://peng.fyi/post/six-options-generating-distributed-ids/ Thu, 20 Apr 2023 00:00:00 +0000 https://peng.fyi/post/six-options-generating-distributed-ids/ Identifying unique entities is a frequent requirement in software development. For instance, assigning a unique ID to each Ad impression enables us to link related events for billing and analysis. However, generating unique IDs becomes challenging when dealing with large distributed systems. In this survey, we explore various options and discuss their suitability. We also introduce the snowflake-id service, a distributed system for generating unique IDs based on the snowflake algorithm and utilizing the etcd. Creating test doubles in Go, manual or auto-generated? https://peng.fyi/post/creating-test-doubles-in-go-manual-or-generated/ Thu, 30 Mar 2023 00:00:00 +0000 https://peng.fyi/post/creating-test-doubles-in-go-manual-or-generated/ We set up test doubles when it is difficult or impossible to use real objects due to complexity or external dependencies. In Go, an interface is a collection of method signatures that define a set of behaviors. With interfaces, it is easy to create test doubles in Go. However, should you write test doubles on your own, use a test library that supports mock, or a tool that generates mock automatically?