Fix Rare Case Where Primary ENI Does Not Serve Default Traffic

In recent testing, an interesting scenario appeared when launching EC2 instances with multiple ENIs: the primary ENI (device index 0) does not serve default network traffic. This occurs in approximately 1 out of 10,000 launches. In our use case, we must ensure that default traffic routes through the primary ENI.

This post demonstrates how to detect and fix the issue in the following simplified environment:

  1. Customized Amazon Linux 2023 (AL2023) AMIs that do not use predictable network interface names
  2. Two ENIs
  3. The fix uses shell scripts, though shells are not installed in production for security reasons

Find the MAC Address of the Primary ENI

According to the IMDS documentation, the instance metadata provides "the instance's media access control (MAC) address. In cases where multiple network interfaces are present, this refers to the eth0 device (the device for which the device number is 0)." So we can query IMDS for the MAC of the primary ENI. Here is the imdsv2 script:

 1#!/bin/sh
 2
 3# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
 4# imdsv2 "mac"
 5
 6function fetch_metadata() {
 7    if [ -z "${1}" ]; then
 8        echo "Usage: fetch_metadata <metadata-path>"
 9        return 1
10    fi
11
12    local METADATA_PATH=${1}
13    local TOKEN_URL="http://169.254.169.254/latest/api/token"
14    local METADATA_URL="http://169.254.169.254/latest/meta-data/${METADATA_PATH}"
15
16    # Fetch the session token
17    local TOKEN=$(curl -s -X PUT "${TOKEN_URL}" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
18
19    if [ -z "${TOKEN}" ]; then
20        echo "Failed to fetch the session token"
21        return 1
22    fi
23
24    # Fetch the metadata using the token
25    local METADATA=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" "${METADATA_URL}")
26
27    if [ -z "${METADATA}" ]; then
28        echo "Failed to fetch the metadata for path: ${METADATA_PATH}"
29        return 1
30    fi
31
32    echo "${METADATA}"
33}
34
35fetch_metadata "${@}"

Find the MAC Address for Default Network Traffic

Here is the get-default-route-mac script:

 1#!/bin/bash
 2
 3# Script to find the default route device and output its MAC address
 4# 1. Check IPv4 default route first, then IPv6 if needed
 5# 2. Extract device name from the route
 6# 3. Output the MAC address of that device
 7
 8set -e
 9
10# Try to get IPv4 default route first
11default_route=$(ip route show default 2>/dev/null | head -n 1)
12
13# If no IPv4 default route, try IPv6
14if [[ -z "${default_route}" ]]; then
15    default_route=$(ip -6 route show default 2>/dev/null | head -n 1)
16fi
17
18# If still no default route found, log error and exit
19if [[ -z "${default_route}" ]]; then
20    echo "Error: No default route found" >&2
21    exit 1
22fi
23
24# Extract device name from default route
25# The format is typically: default via <gateway> dev <device> [other params]
26device=$(echo "${default_route}" | grep -o 'dev [^ ]*' | cut -d' ' -f2)
27
28if [[ -z "${device}" ]]; then
29    echo "Error: Could not extract device name from default route: ${default_route}" >&2
30    exit 1
31fi
32
33# Get MAC address of the device
34mac_address=$(cat /sys/class/net/${device}/address 2>/dev/null)
35
36if [[ -z "${mac_address}" ]]; then
37    echo "Error: Could not get MAC address for device ${device}" >&2
38    exit 1
39fi
40
41# Echo the MAC address
42echo "${mac_address}"

Swap Network Interface Names: eth0 ↔ eth1

Now that we have both MAC addresses, we can compare them. If the primary ENI's MAC differs from the default traffic MAC, we can run the swap-eth0-eth1 script:

 1#!/bin/bash
 2set -e
 3
 4systemctl stop network
 5
 6ETH0_MAC=$(cat /sys/class/net/eth0/address)
 7ETH1_MAC=$(cat /sys/class/net/eth1/address)
 8
 9cat > /etc/udev/rules.d/70-rename-interfaces.rules << EOF
10SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="${ETH1_MAC}", KERNEL=="eth*", NAME="eth0"
11SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="${ETH0_MAC}", KERNEL=="eth*", NAME="eth1"
12EOF
13
14cat /etc/udev/rules.d/70-rename-interfaces.rules
15
16/sbin/udevadm control --reload-rules
17
18/sbin/udevadm trigger --attr-match=subsystem=net --action=add
19
20systemctl start network