SAA-C03Chapter 164 of 189Objective 3.1

Elastic Fabric Adapter (EFA) for HPC

This chapter covers AWS Elastic Fabric Adapter (EFA), a network interface that enables high-performance computing (HPC) workloads to achieve low-latency, high-throughput communication by bypassing the operating system kernel. For the SAA-C03 exam, EFA is a niche but high-value topic, appearing in approximately 5-8% of questions, often in scenarios involving tightly coupled HPC, computational fluid dynamics, or financial modeling. Understanding EFA's architecture, its differences from Elastic Network Adapter (ENA) and AWS Direct Connect, and its integration with placement groups and MPI is critical for selecting the right networking solution for HPC workloads.

25 min read
Intermediate
Updated May 31, 2026

Dedicated Express Lane for Supercomputing Traffic

Imagine a busy warehouse where workers (compute nodes) need to pass tiny, urgent messages to each other constantly—like passing a single ball bearing every few microseconds. In a normal network (TCP), each message is like sending a package via courier: the package is sealed, addressed, tracked, and the courier waits for a delivery receipt. This overhead (headers, acknowledgments, retransmissions) is fine for large shipments but crippling when you need to pass millions of ball bearings per second. EFA is like installing a set of pneumatic tubes that run directly between each worker's station. When a worker wants to send a ball bearing, they drop it into the tube at their end, and it instantly appears at the destination worker's station—no packaging, no tracking, no receipt. The tube bypasses the warehouse's general-purpose conveyor belt (OS kernel) and operates at near light speed. However, the tube can only carry one ball bearing at a time and is only useful if workers are constantly exchanging small items; for large packages, the courier service remains better. AWS attaches these tubes only to specific worker stations (instances optimized for HPC) and requires the workers to use a special language (libfabric) to shout into the tube. Crucially, the tube is a direct memory-to-memory link: the sender writes data into a shared memory region on the receiver, without the receiver's CPU even being interrupted—that's the magic of OS bypass and RDMA.

How It Actually Works

What is EFA and Why It Exists

Elastic Fabric Adapter (EFA) is a custom-built network interface for Amazon EC2 instances that accelerates High Performance Computing (HPC) and machine learning (ML) applications. Unlike standard Elastic Network Adapters (ENA) which handle TCP/UDP traffic through the kernel, EFA provides OS-bypass capabilities, allowing applications to communicate directly with the hardware. This reduces latency from tens of microseconds to single-digit microseconds and increases message rate from thousands to millions of messages per second.

The primary driver for EFA is the need for tightly coupled parallel computing. In HPC workloads like weather simulation, molecular dynamics, or computational fluid dynamics, thousands of compute nodes must exchange small messages (often less than 100 bytes) at very high frequency. Traditional TCP/IP incurs significant overhead due to context switching, data copying, and protocol processing. EFA eliminates these bottlenecks by providing a reliable, low-latency transport that bypasses the kernel entirely.

How EFA Works Internally

EFA leverages two key technologies: - RDMA (Remote Direct Memory Access): Allows one computer to directly access the memory of another computer over a high-speed network without involving the CPU, cache, or OS of either system. This eliminates data copying and reduces CPU overhead. - OS-bypass: User-space applications can post send/receive operations directly to the EFA hardware, bypassing the kernel network stack. This removes context switches and interrupt handling.

At the protocol level, EFA implements a custom transport protocol called EFA Transport Protocol (EFAtp). It operates over a lossless fabric using priority flow control (PFC) to ensure no packet drops. The protocol supports: - Reliable Datagram: Messages are delivered exactly once, in order, without the overhead of TCP connection management. - Unreliable Datagram: For applications that can tolerate occasional loss (e.g., some ML training). - Reliable Connection: For stateful communication requiring guaranteed delivery.

From an application perspective, EFA exposes a libfabric interface, which is a user-space library that provides a standardized API for fabric communication. libfabric supports multiple providers; the EFA provider (efadv) is specifically optimized for AWS hardware.

Key Components, Values, Defaults, and Timers

Maximum Transmission Unit (MTU): 9001 bytes (jumbo frames) for data payload. However, EFA's internal protocol uses smaller message sizes for control packets. The maximum payload per RDMA write is 1 GB.

Latency: Sub-10 microseconds for small messages (e.g., 8 bytes) between two instances in the same placement group.

Bandwidth: Up to 100 Gbps per adapter on supported instances (e.g., p4d.24xlarge, p5.48xlarge).

Message Rate: Over 10 million messages per second (Mpps) for small messages.

Supported Instance Types: All current HPC-optimized instances: p4d, p4de, p5, p5e, trn1, trn2, hpc6a, hpc7a, hpc6id, hpc7g. Also supported on compute-optimized instances like c5n, c6gn, c7gn, but only for ENA traffic; EFA requires explicit enablement.

Placement Group Requirement: For optimal performance, instances must be in a cluster placement group. This ensures they are placed in a single Availability Zone with low-latency, high-bandwidth connectivity. EFA uses a non-blocking fat-tree topology within a placement group.

Security: EFA traffic is not encrypted by default. For encryption, use application-level encryption or AWS PrivateLink (not recommended for HPC due to latency).

Elastic Fabric Adapter vs. ENA: ENA supports standard TCP/UDP with kernel involvement; EFA supports OS-bypass for HPC. Both can coexist on the same instance; EFA handles HPC traffic, ENA handles management traffic.

Configuration and Verification

To enable EFA: 1. Launch an EC2 instance from an HPC-optimized AMI (e.g., AWS ParallelCluster AMI) or a custom AMI with the EFA kernel module and libfabric installed. 2. Select an instance type that supports EFA (e.g., p4d.24xlarge). 3. In the launch configuration, under Network interfaces, attach an EFA device. You can attach up to 4 EFAs per instance on supported types. 4. Ensure the instance is in a cluster placement group. 5. Install the EFA software package (e.g., via aws-efa-installer).

Verification commands:

# Check if EFA device is present
lspci | grep -i efa

# Check EFA driver version
modinfo efa | grep version

# Run libfabric info test
fi_info -p efa -t FI_EP_DGRAM

# Check EFA health status
/opt/amazon/efa/bin/fi_info -p efa -t FI_EP_DGRAM | grep -E "(name|domain|version)"

Interaction with Related Technologies

AWS ParallelCluster: Orchestrates HPC clusters with EFA-enabled instances, automatically configuring placement groups and network interfaces.

MPI (Message Passing Interface): EFA is specifically designed for MPI workloads. OpenMPI and MPICH support the EFA provider via libfabric. Use --mca pml ob1 --mca btl tcp,self --mca btl_tcp_if_include eth0 for TCP fallback.

NVIDIA GPUDirect RDMA: On GPU instances (p4d, p5), EFA can directly access GPU memory, enabling peer-to-peer communication between GPUs across nodes without CPU involvement.

Elastic Network Adapter (ENA): Always present alongside EFA for management traffic; EFA is an additional adapter dedicated to HPC traffic.

AWS Direct Connect: Not a replacement for EFA; Direct Connect provides dedicated on-premises connectivity, not inter-node HPC communication.

Performance Characteristics

EFA achieves its performance through: - Kernel bypass: Application writes directly to hardware queue pairs. - Zero-copy: Data is transferred from application memory to network without intermediate buffers. - Reliable transport with hardware retransmission: The EFA hardware handles retransmissions at the link layer, not the OS. - Adaptive routing: Traffic is dynamically load-balanced across multiple paths within the placement group.

However, EFA performance degrades if instances are spread across different placement groups or Availability Zones. Latency increases by 50-100 microseconds across AZs. Also, EFA does not support multicast; use MPI collective operations instead.

Limitations

Not routable: EFA traffic is confined to a single VPC subnet and cannot traverse a NAT gateway, internet gateway, or VPN. It is designed for intra-VPC communication only.

No IPv6 support: EFA only supports IPv4.

No security groups: EFA traffic is not filtered by security groups or network ACLs. Use application-level authentication.

No flow logs: VPC Flow Logs do not capture EFA traffic.

Limited instance support: Only specific HPC and GPU instance types support EFA.

No Elastic IP: EFA devices cannot have public IP addresses.

No traffic mirroring: Cannot mirror EFA traffic for monitoring.

Use Cases

Computational fluid dynamics (CFD) using ANSYS Fluent or OpenFOAM

Molecular dynamics (NAMD, GROMACS)

Weather forecasting (WRF)

Financial risk modeling (Monte Carlo simulations)

Machine learning training with distributed frameworks (NCCL, Horovod)

Cost

EFA itself is free; you only pay for the EC2 instances and data transfer. However, HPC-optimized instances are expensive (p4d.24xlarge at ~$32/hour). Use Spot Instances for cost savings if workloads are fault-tolerant.

Walk-Through

1

Launch HPC Instance with EFA

Select an EFA-supported instance type (e.g., p4d.24xlarge) and an HPC-optimized AMI. In the EC2 launch wizard, under 'Network interfaces', click 'Add device' and select 'Elastic Fabric Adapter'. You must also create or select a cluster placement group. Ensure the subnet is in the same VPC as other cluster nodes. The instance will have both an ENA (for management) and one or more EFAs. After launch, install the EFA software using the `aws-efa-installer` script. Verify with `lspci | grep -i efa`.

2

Configure libfabric and MPI

Install libfabric and an MPI implementation (e.g., OpenMPI) compiled with EFA support. Set environment variables: `FI_PROVIDER=efa` and `FI_EFA_USE_DEVICE_RDMA=1`. For OpenMPI, use `--mca pml ob1 --mca btl tcp,self --mca btl_tcp_if_include eth0` to force EFA for inter-node messages. For NCCL (GPU ML), set `NCCL_PROTO=Simple` and `NCCL_ALGO=Ring`. Ensure all nodes have identical configurations. Run a simple ping-pong test using `mpi_hello_world` to verify connectivity.

3

Application Initializes EFA Endpoints

When the MPI application starts, each process calls `fi_getinfo()` to query available fabric interfaces. The libfabric library detects the EFA provider and returns a list of endpoints. The process then creates an active endpoint using `fi_endpoint()`. The endpoint is bound to a completion queue (CQ) and a wait set. The application posts receive buffers using `fi_recv()` and initiates sends using `fi_send()`. The hardware queues these operations in the EFA device's queue pair (QP). The kernel is not involved in any of these operations.

4

Data Transfer via RDMA Write

For a send operation, the application calls `fi_send()` with a pointer to a user-space buffer. libfabric translates this into an RDMA write request. The EFA hardware reads the data directly from the application's memory (via DMA) and transmits it over the fabric. On the receiver side, the EFA hardware writes the data directly into a pre-posted receive buffer in application memory, without interrupting the CPU. The completion is reported via the completion queue. This zero-copy, kernel-bypass mechanism achieves sub-10 microsecond latency.

5

Collective Communication with MPI

MPI collective operations (e.g., MPI_Allreduce) are implemented using optimized algorithms that leverage EFA's reliable datagram. For example, a reduce-scatter operation may use a recursive doubling algorithm. Each node sends/receives messages in parallel. EFA's high message rate (millions per second) allows these operations to complete quickly. The MPI library selects the best algorithm based on message size and node count. For large messages, EFA's RDMA write is used; for small messages, EFA's send/receive semantics are used.

What This Looks Like on the Job

Scenario 1: Computational Fluid Dynamics at an Aerospace Company

An aerospace company runs ANSYS Fluent simulations on a cluster of 100 p4d.24xlarge instances. Each instance has 4 EFAs, providing 400 Gbps aggregate bandwidth per node. The CFD mesh has 500 million cells, requiring frequent halo exchanges between neighboring nodes. Using EFA, the simulation completes in 4 hours instead of 20 hours with ENA. The cluster is placed in a single cluster placement group in us-east-1a. The company uses AWS ParallelCluster to automate instance provisioning and MPI configuration. Key performance metric: MPI_Allreduce latency for 8-byte messages is ~2 microseconds between any two nodes.

Scenario 2: Financial Risk Modeling at a Bank

A bank runs Monte Carlo simulations for portfolio risk using a custom MPI application. They use 50 hpc7a.48xlarge instances (AMD EPYC) with EFA. The workload is memory-bound, not compute-bound, so EFA's low latency is critical for synchronizing random number generators across nodes. Without EFA, the simulation would require checkpointing every 10 minutes to recover from stragglers; with EFA, synchronization overhead is negligible. The bank uses Spot Instances to reduce costs, but must handle interruptions via checkpointing. EFA's lack of encryption is acceptable because the VPC is private and data is encrypted at rest.

Scenario 3: Machine Learning Training at a Tech Company

A tech company trains a large language model using 256 p5.48xlarge instances (NVIDIA H100 GPUs). They use NVIDIA NCCL for GPU-to-GPU communication. EFA supports GPUDirect RDMA, allowing GPUs to directly communicate across nodes without CPU involvement. The training throughput is 100 TFLOPS per GPU, and EFA provides 3200 Gbps bisection bandwidth per node. The cluster uses a non-blocking fat-tree topology within a single placement group. The company monitors EFA health using CloudWatch metrics (e.g., EFA_ReadBytes, EFA_WriteBytes). A common issue is misconfigured NCCL environment variables, leading to fallback to TCP (ENA), which reduces throughput by 10x.

How SAA-C03 Actually Tests This

What SAA-C03 Tests on EFA

The exam tests EFA under Domain 3: High Performance Computing (Objective 3.1). You must understand:

EFA is for HPC/ML workloads requiring low latency and high throughput.

EFA provides OS-bypass and RDMA; ENA does not.

EFA is only available on specific instance types (p4d, p5, trn1, hpc6a, etc.).

EFA requires a cluster placement group for optimal performance.

EFA traffic is not routable across subnets or AZs (must be same AZ).

EFA does not support security groups, flow logs, or encryption.

EFA is free; you pay for instances and data transfer.

Common Wrong Answers

1.

'Use AWS Direct Connect instead of EFA for HPC': Wrong. Direct Connect connects on-premises to AWS, not inter-node communication. EFA is for intra-VPC HPC.

2.

'EFA supports encryption by default': Wrong. EFA does not encrypt traffic; use application-level encryption.

3.

'EFA can be used with any EC2 instance type': Wrong. Only HPC and GPU instances support EFA.

4.

'EFA provides higher bandwidth than ENA': Trick: EFA and ENA can have same bandwidth (e.g., 100 Gbps), but EFA provides lower latency and higher message rate.

5.

'EFA can route traffic across VPCs via VPC peering': Wrong. EFA traffic is confined to a single VPC subnet.

Specific Numbers and Terms

Latency: sub-10 microseconds

Bandwidth: up to 100 Gbps per adapter

Message rate: over 10 million messages per second

MTU: 9001 bytes (jumbo frames)

Supported instances: p4d, p5, trn1, hpc6a, hpc7a, hpc6id, hpc7g

Software: libfabric, EFA kernel module, aws-efa-installer

Placement: cluster placement group

Protocol: EFAtp (custom reliable transport)

Edge Cases

EFA does not support IPv6.

EFA does not support multicast.

EFA cannot be attached to an instance after launch; must be added at launch time.

EFA devices cannot be detached or reattached.

EFA traffic is not captured by VPC Flow Logs.

EFA is not supported on bare metal instances.

How to Eliminate Wrong Answers

If a scenario describes a tightly coupled HPC workload (e.g., MPI, CFD, molecular dynamics), look for keywords: 'low latency', 'high message rate', 'OS bypass', 'RDMA'. The correct answer will include EFA and cluster placement group. If the scenario mentions encryption, security groups, or inter-VPC communication, eliminate EFA. If the instance type is t2.micro or c5.large, EFA is not supported. If the question asks for a solution that connects on-premises, eliminate EFA and choose Direct Connect or VPN.

Key Takeaways

EFA provides OS-bypass and RDMA for HPC workloads, achieving sub-10 microsecond latency and over 10 million messages per second.

EFA is only available on specific instance types: p4d, p5, trn1, hpc6a, hpc7a, hpc6id, hpc7g.

EFA must be attached at instance launch and requires a cluster placement group for optimal performance.

EFA traffic is confined to a single subnet and Availability Zone; it cannot traverse VPC boundaries.

EFA does not support security groups, network ACLs, flow logs, encryption, or IPv6.

EFA is free; you only pay for the EC2 instances and data transfer.

For the exam, choose EFA when the scenario involves tightly coupled HPC with MPI or low-latency requirements.

Do not confuse EFA with ENA; ENA is for general networking, EFA is for HPC.

Easy to Mix Up

These come up on the exam all the time. Here's how to tell them apart.

Elastic Fabric Adapter (EFA)

OS-bypass and RDMA support

Sub-10 microsecond latency

Over 10 million messages per second

Requires cluster placement group

Only supported on HPC/GPU instance types

Elastic Network Adapter (ENA)

Kernel-based networking (TCP/UDP)

Latency in tens of microseconds

Thousands of messages per second

Works with any placement group or no placement group

Supported on all instance types

Watch Out for These

Mistake

EFA is just a faster version of ENA.

Correct

EFA is architecturally different: it provides OS-bypass and RDMA, while ENA is a standard kernel-based network adapter. EFA is not a drop-in replacement; applications must be written to use libfabric or MPI with EFA support.

Mistake

EFA works across Availability Zones and VPCs.

Correct

EFA traffic is confined to a single VPC subnet within one Availability Zone. It cannot traverse AZ boundaries, VPC peering, or transit gateways. For cross-AZ communication, you must use ENA.

Mistake

EFA provides built-in encryption.

Correct

EFA does not encrypt traffic. The hardware does not support inline encryption. For compliance, use application-level encryption (e.g., TLS) or run encrypted MPI (e.g., using OpenMPI with SSL).

Mistake

EFA can be attached to any EC2 instance after launch.

Correct

EFA must be attached at launch time. It cannot be added or removed from a running instance. You must select the EFA device in the launch configuration.

Mistake

EFA is required for all HPC workloads on AWS.

Correct

EFA is beneficial only for tightly coupled, latency-sensitive, and message-intensive workloads. For embarrassingly parallel workloads (e.g., many independent tasks), ENA with TCP is sufficient and cheaper.

Do You Actually Know This?

Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.

Frequently Asked Questions

What is the difference between EFA and ENA?

EFA (Elastic Fabric Adapter) is a network interface designed for HPC that supports OS-bypass and RDMA, providing sub-10 microsecond latency and millions of messages per second. ENA (Elastic Network Adapter) is a standard network adapter that processes traffic through the kernel, offering tens of microseconds latency and thousands of messages per second. EFA is only available on specific instance types (e.g., p4d, p5) and requires a cluster placement group, while ENA is available on all instances.

Can I use EFA across different Availability Zones?

No. EFA traffic is confined to a single subnet within one Availability Zone. Using EFA across AZs is not supported. For cross-AZ communication, you must use ENA or other networking technologies. If you need HPC across AZs, consider using a multi-AZ architecture with ENA and accept higher latency.

Does EFA support encryption?

No. EFA does not provide encryption at the hardware or protocol level. All EFA traffic is unencrypted. If encryption is required, you must implement it at the application layer (e.g., using TLS or SSH tunnels) or use a VPN over ENA. For exam purposes, remember that EFA traffic is not encrypted by default.

What is the maximum number of EFAs per instance?

The maximum number of EFAs per instance depends on the instance type. For example, p4d.24xlarge supports up to 4 EFAs, while p5.48xlarge supports up to 8 EFAs. Check the instance documentation for exact limits. Each EFA provides up to 100 Gbps of bandwidth.

Can I attach an EFA to a running instance?

No. EFA must be specified at launch time. You cannot attach or detach an EFA after the instance is running. This is because EFA requires direct hardware access that must be configured during instance initialization.

Do I need a placement group for EFA?

For optimal performance, yes. EFA achieves its low latency and high throughput when instances are in a cluster placement group, which ensures they are physically close together in a single Availability Zone. Without a placement group, performance may degrade, but EFA will still function.

Is EFA compatible with AWS ParallelCluster?

Yes. AWS ParallelCluster natively supports EFA. When you define a cluster with HPC instance types, ParallelCluster automatically configures EFA devices and cluster placement groups. It also installs the necessary drivers and MPI libraries. This is the recommended way to manage large EFA clusters.

Terms Worth Knowing

Ready to put this to the test?

You've just covered Elastic Fabric Adapter (EFA) for HPC — now see how well it sticks with free SAA-C03 practice questions. Full explanations included, no account needed.

Done with this chapter?