Fix Rare Case Where Primary ENI Does Not Serve Default Traffic
In recent testing, an interesting scenario appeared when launching EC2 instances with multiple ENIs: the primary ENI (device index 0) does not serve default network traffic. This occurs in approximately 1 out of 10,000 launches. In our use case, we must ensure that default traffic routes through the primary ENI.
This post demonstrates how to detect and fix the issue in the following simplified environment:
- Customized Amazon Linux 2023 (AL2023) AMIs that do not use predictable network interface names
- Two ENIs
- The fix uses shell scripts, though shells are not installed in production for security reasons
Find the MAC Address of the Primary ENI
According to the IMDS documentation, the instance metadata provides "the instance's media access control (MAC) address. In cases where multiple network interfaces are present, this refers to the eth0 device (the device for which the device number is 0)."
So we can query IMDS for the MAC of the primary ENI. Here is the imdsv2
script:
1#!/bin/sh
2
3# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
4# imdsv2 "mac"
5
6function fetch_metadata() {
7 if [ -z "${1}" ]; then
8 echo "Usage: fetch_metadata <metadata-path>"
9 return 1
10 fi
11
12 local METADATA_PATH=${1}
13 local TOKEN_URL="http://169.254.169.254/latest/api/token"
14 local METADATA_URL="http://169.254.169.254/latest/meta-data/${METADATA_PATH}"
15
16 # Fetch the session token
17 local TOKEN=$(curl -s -X PUT "${TOKEN_URL}" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
18
19 if [ -z "${TOKEN}" ]; then
20 echo "Failed to fetch the session token"
21 return 1
22 fi
23
24 # Fetch the metadata using the token
25 local METADATA=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" "${METADATA_URL}")
26
27 if [ -z "${METADATA}" ]; then
28 echo "Failed to fetch the metadata for path: ${METADATA_PATH}"
29 return 1
30 fi
31
32 echo "${METADATA}"
33}
34
35fetch_metadata "${@}"
Find the MAC Address for Default Network Traffic
Here is the get-default-route-mac
script:
1#!/bin/bash
2
3# Script to find the default route device and output its MAC address
4# 1. Check IPv4 default route first, then IPv6 if needed
5# 2. Extract device name from the route
6# 3. Output the MAC address of that device
7
8set -e
9
10# Try to get IPv4 default route first
11default_route=$(ip route show default 2>/dev/null | head -n 1)
12
13# If no IPv4 default route, try IPv6
14if [[ -z "${default_route}" ]]; then
15 default_route=$(ip -6 route show default 2>/dev/null | head -n 1)
16fi
17
18# If still no default route found, log error and exit
19if [[ -z "${default_route}" ]]; then
20 echo "Error: No default route found" >&2
21 exit 1
22fi
23
24# Extract device name from default route
25# The format is typically: default via <gateway> dev <device> [other params]
26device=$(echo "${default_route}" | grep -o 'dev [^ ]*' | cut -d' ' -f2)
27
28if [[ -z "${device}" ]]; then
29 echo "Error: Could not extract device name from default route: ${default_route}" >&2
30 exit 1
31fi
32
33# Get MAC address of the device
34mac_address=$(cat /sys/class/net/${device}/address 2>/dev/null)
35
36if [[ -z "${mac_address}" ]]; then
37 echo "Error: Could not get MAC address for device ${device}" >&2
38 exit 1
39fi
40
41# Echo the MAC address
42echo "${mac_address}"
Swap Network Interface Names: eth0 ↔ eth1
Now that we have both MAC addresses, we can compare them. If the primary ENI's MAC differs from the default traffic MAC, we can run the swap-eth0-eth1
script:
1#!/bin/bash
2set -e
3
4systemctl stop network
5
6ETH0_MAC=$(cat /sys/class/net/eth0/address)
7ETH1_MAC=$(cat /sys/class/net/eth1/address)
8
9cat > /etc/udev/rules.d/70-rename-interfaces.rules << EOF
10SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="${ETH1_MAC}", KERNEL=="eth*", NAME="eth0"
11SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="${ETH0_MAC}", KERNEL=="eth*", NAME="eth1"
12EOF
13
14cat /etc/udev/rules.d/70-rename-interfaces.rules
15
16/sbin/udevadm control --reload-rules
17
18/sbin/udevadm trigger --attr-match=subsystem=net --action=add
19
20systemctl start network