This chapter covers load balancing methods and algorithms, a core topic in Network Implementation for the CompTIA Network+ N10-009 exam (Objective 2.6). Understanding how load balancers distribute traffic across servers is critical for ensuring high availability, scalability, and performance in modern networks. Expect 5-8% of exam questions to touch on load balancing concepts, algorithms, and configuration.
Jump to a section
Imagine a hotel with a single front desk but hundreds of guests needing check-in simultaneously. Without a system, guests would crowd the desk, causing chaos. The hotel hires a concierge who stands at the entrance and directs each guest to one of multiple check-in kiosks. The concierge uses a rule: she sends the next guest to the kiosk with the fewest people waiting (least connections). She also knows that some guests have VIP status (session persistence) and must go to the same kiosk they used before. If a kiosk breaks (server failure), she stops sending guests there and redistributes them. The concierge periodically checks each kiosk's queue length (health check) to update her knowledge. In this analogy, the concierge is the load balancer, the kiosks are backend servers, and the guests are client requests. The mechanism mirrors how a load balancer distributes incoming traffic across a server pool based on algorithms like round-robin or least connections, while performing health checks and maintaining session stickiness.
What is Load Balancing and Why Does It Exist?
Load balancing is the process of distributing network traffic across multiple servers (or other resources) to ensure no single server becomes overwhelmed, thereby improving responsiveness, reliability, and availability. It is a fundamental component of high-availability architectures and is used in data centers, cloud environments, and enterprise networks.
The primary goals of load balancing are: - High Availability: If one server fails, traffic is redirected to healthy servers. - Scalability: New servers can be added without disrupting service. - Performance: Requests are distributed to avoid bottlenecks. - Efficient Resource Utilization: Each server handles a fair share of the load.
Load balancers can be hardware-based (e.g., F5 BIG-IP, Citrix ADC) or software-based (e.g., HAProxy, Nginx, AWS ELB). They operate at various OSI layers: Layer 4 (transport) or Layer 7 (application).
How Load Balancing Works Internally
A load balancer sits between clients and backend servers. When a client sends a request, the load balancer intercepts it, selects a backend server based on a configured algorithm, and forwards the request. The response flows back through the load balancer (or directly to the client if using Direct Server Return).
Key components: - Virtual IP (VIP): The IP address clients connect to. - Real Servers: The backend servers that actually process requests. - Server Pool: A group of real servers managed by the load balancer. - Health Monitors: Mechanisms to check server health (e.g., ICMP ping, TCP port check, HTTP GET). - Persistence: Ensures a client's subsequent requests go to the same server (session stickiness).
Load Balancing Algorithms
Load balancing algorithms determine how to select a server for each new request. The N10-009 exam tests the following:
Round Robin - Distributes requests sequentially across the server pool. - Simple and works well when servers have similar capacity. - Does not account for current load or server performance.
Least Connections - Sends requests to the server with the fewest active connections. - More dynamic than round robin; adapts to varying request processing times. - Default algorithm in many load balancers (e.g., HAProxy).
Weighted Round Robin - Assigns a weight to each server based on capacity. - Servers with higher weights receive more requests. - Useful in heterogeneous environments.
Weighted Least Connections - Combines weights with connection counts. - Servers with higher weight and fewer connections get priority.
Source IP Hash - Uses a hash of the client's source IP to select a server. - Ensures the same client always goes to the same server (built-in persistence). - Useful for session-based applications.
Least Response Time - Sends requests to the server with the fastest response time. - Requires continuous monitoring of server response times.
Session Persistence (Stickiness)
Some applications require that a client's requests go to the same server throughout a session (e.g., shopping cart). Persistence methods include: - Source IP Affinity: Uses the client's IP address (via hash) to map to a server. - Cookie Insertion: The load balancer inserts a cookie into the response that identifies the server. - Cookie Learning: The load balancer reads an existing application cookie to determine the server.
Health Checks
Health checks are critical for ensuring traffic is only sent to healthy servers. Types: - ICMP Ping: Checks basic reachability. - TCP Port Check: Attempts a TCP connection to a specific port. - HTTP/HTTPS Check: Sends an HTTP GET request and expects a specific status code (e.g., 200 OK). - Script-based Check: Runs a custom script on the server.
Health check parameters: - Interval: How often the check is performed (default often 5 seconds). - Timeout: How long to wait for a response (default 2-3 seconds). - Threshold (fails): Number of consecutive failures before marking server down (default 2-3). - Threshold (successes): Number of consecutive successes to mark server up (default 2-3).
Load Balancer Configuration and Verification
Example configuration for HAProxy (software load balancer):
frontend http-in
bind *:80
default_backend servers
backend servers
balance roundrobin
server web1 192.168.1.10:80 check inter 2000 fall 3 rise 2
server web2 192.168.1.11:80 check inter 2000 fall 3 rise 2Verification commands (depending on platform):
- show stat (HAProxy)
- show lbal vservers (Cisco)
- show pool (F5)
Interaction with Related Technologies
Load balancers often work with: - Firewalls: Load balancers may be placed behind firewalls for security. - DNS: Global Server Load Balancing (GSLB) uses DNS to distribute traffic across geographic locations. - SSL Termination: Load balancers can offload SSL decryption to reduce server load. - Content Caching: Some load balancers cache static content to improve performance.
Performance Considerations
Connection Limits: Each server has a maximum number of concurrent connections. Load balancers should respect these.
Timeouts: TCP idle timeouts can cause dropped connections if too short.
Scale: For high traffic, multiple load balancers in active-active or active-standby configuration are used.
Common Defaults and Timers
Health check interval: 5 seconds (common default)
Health check timeout: 2 seconds
Down threshold: 3 consecutive failures
Up threshold: 2 consecutive successes
Persistence timeout: 30 minutes (configurable)
Exam Relevance
The N10-009 exam expects you to:
Identify the correct algorithm for a given scenario.
Understand the difference between Layer 4 and Layer 7 load balancing.
Know that round robin is simple but doesn't consider load, while least connections adapts.
Recognize that source IP hash provides persistence but can cause uneven distribution.
Understand health checks and their parameters.
Client Sends Request
A client (e.g., web browser) initiates a TCP connection to the load balancer's virtual IP (VIP) on port 80 or 443. The client is unaware of the backend servers; it only knows the VIP. The load balancer receives the SYN packet and must decide which backend server will handle the request.
Load Balancer Selects Algorithm
The load balancer applies its configured algorithm (e.g., round robin, least connections) to select a backend server. For round robin, it maintains a pointer that cycles through the server list. For least connections, it queries the current active connection count for each server. The decision is made at the transport layer (Layer 4) or application layer (Layer 7) depending on configuration.
Forward Request to Server
The load balancer rewrites the destination IP address to the selected server's IP (Destination NAT) and forwards the packet. It also records the mapping in a connection table so that subsequent packets from the same TCP session go to the same server. The source IP may be rewritten to the load balancer's own IP (Source NAT) or preserved (Direct Server Return).
Server Processes Request
The backend server receives the request, processes it (e.g., fetches a web page), and sends the response back. The response is sent to the load balancer (if SNAT is used) or directly to the client (if DSR is used). In the SNAT case, the load balancer rewrites the source IP back to the VIP before forwarding to the client.
Health Monitor Checks Server
Periodically (e.g., every 5 seconds), the load balancer sends health checks to each server. For an HTTP check, it sends a GET / and expects a 200 OK. If the server fails to respond within the timeout (e.g., 2 seconds) for three consecutive intervals, the load balancer marks the server as down and stops sending traffic to it until it passes health checks again.
Scenario 1: E-commerce Website with Session Persistence
An online retailer uses a load balancer in front of three web servers. Each server maintains a shopping cart in memory. The load balancer must ensure that a user's requests go to the same server throughout their session. The solution uses Source IP Hash algorithm, which hashes the client's IP to select a server. This provides built-in persistence without needing cookies. However, if many users share the same public IP (e.g., corporate NAT), they all hash to the same server, causing imbalance. To mitigate, the load balancer uses Cookie Insertion instead: it inserts a cookie named BIGipServer into the HTTP response, which encodes the server identifier. On subsequent requests, the load balancer reads the cookie and forwards to the correct server. Misconfiguration example: if the persistence timeout is set too low (e.g., 1 minute), users may lose their session during checkout. Best practice: set persistence timeout to 30 minutes and use health checks with a 5-second interval and 2-second timeout.
Scenario 2: Video Streaming Service with Least Connections
A video streaming platform uses a load balancer with Least Connections algorithm. Each streaming request holds a TCP connection for a long duration (e.g., 30 minutes). A round-robin algorithm would overload servers with longer-lived connections because it doesn't consider current load. Least Connections sends new requests to the server with the fewest active connections, balancing the load more evenly. The load balancer also performs TCP port health checks on port 443. In production, the load balancer handles 100,000 concurrent connections across 10 servers. The team observes that one server occasionally has 30% more connections than others due to variable stream lengths. They switch to Weighted Least Connections by assigning higher weights to servers with more CPU resources, achieving better balance.
Scenario 3: Global Server Load Balancing (GSLB) for Disaster Recovery
A multinational corporation uses DNS-based GSLB to distribute traffic across data centers in New York, London, and Tokyo. Each data center has a local load balancer that distributes traffic within the site. The GSLB uses Round Robin across sites but also considers proximity using GeoIP. If the New York site fails a health check (e.g., HTTP 503), the GSLB removes its IP from DNS responses. However, DNS caching can cause clients to hit the failed site for up to the TTL (e.g., 300 seconds). To reduce failover time, the TTL is set to 30 seconds. The local load balancers use Least Connections internally. Common pitfall: misconfigured health checks that mark a site as down due to a single failed server, causing unnecessary failover. Best practice: health checks should target a synthetic endpoint that verifies overall site health.
What N10-009 Tests on Load Balancing (Objective 2.6)
The exam focuses on:
Identifying the correct load balancing algorithm for a given scenario (e.g., round robin for equal-capacity servers, least connections for variable-length requests, source IP hash for session persistence).
Understanding the difference between Layer 4 (transport) and Layer 7 (application) load balancing.
Knowing that health checks are used to determine server availability.
Recognizing that persistence methods include source IP affinity and cookie insertion.
Common Wrong Answers and Why Candidates Choose Them
Round robin is always the best choice – Candidates assume it's simplest and therefore best. Reality: Round robin doesn't consider server load or capacity; it can cause overload if servers have different capabilities.
Least connections is the same as round robin – No, least connections actively monitors connections; round robin cycles blindly.
Source IP hash provides no persistence – Actually, it does provide persistence by design, but candidates confuse it with round robin.
Health checks are optional – They are essential; without them, traffic may be sent to failed servers.
Specific Numbers and Terms on the Exam
Default health check interval: 5 seconds (common, but not always exact – focus on concept).
Down threshold: 3 consecutive failures.
Persistence timeout: 30 minutes (common default).
Algorithms: Round Robin, Least Connections, Weighted, Source IP Hash.
Terms: Virtual IP (VIP), real server, server pool, health monitor, stickiness.
Edge Cases and Exceptions
If all servers are marked down, the load balancer may return an error or use a fallback server.
Source IP hash can cause uneven distribution if many clients share the same IP (NAT).
Cookie persistence may fail if the client blocks cookies.
Layer 7 load balancing allows content-based routing (e.g., /images to one server, /api to another).
How to Eliminate Wrong Answers
If a question mentions "session persistence" or "sticky sessions," eliminate round robin and least connections (unless combined with persistence mechanism).
If a question mentions "servers with different capacities," look for weighted algorithm.
If a question mentions "variable request processing time," least connections is often correct.
Health checks are always relevant; any answer ignoring them is wrong.
Load balancing distributes traffic across multiple servers to improve availability and performance.
Common algorithms: Round Robin, Least Connections, Weighted, Source IP Hash.
Round Robin is simple but does not consider server load; Least Connections adapts to current load.
Source IP Hash provides session persistence but can cause uneven distribution.
Health checks (ICMP, TCP, HTTP) are essential to detect server failures.
Default health check interval is typically 5 seconds; down threshold is often 3 failures.
Session persistence (stickiness) can be achieved via source IP affinity or cookie insertion.
Layer 4 load balancing uses IP/port; Layer 7 can inspect application content.
These come up on the exam all the time. Here's how to tell them apart.
Round Robin
Distributes requests sequentially without considering current load.
Simple to implement and understand.
Works well when all servers have equal capacity and request processing time is uniform.
Does not require monitoring of connection counts.
Can cause uneven load if request durations vary.
Least Connections
Sends requests to the server with the fewest active connections.
Requires tracking of active connections per server.
Adapts to varying request processing times.
Better suited for environments with heterogeneous server capabilities.
May not be ideal for very short-lived connections (e.g., HTTP/1.0) where connection counts fluctuate rapidly.
Mistake
Round robin is the most efficient algorithm because it distributes evenly.
Correct
Round robin distributes requests sequentially without considering server load. If one server is slow or has many long-lived connections, it can become overloaded. Least connections is often more efficient for variable workloads.
Mistake
Load balancing only works at Layer 4 (transport).
Correct
Load balancing can operate at both Layer 4 (IP/port) and Layer 7 (application content). Layer 7 allows for content-based routing, header inspection, and SSL termination.
Mistake
Health checks are only needed for servers, not for the load balancer itself.
Correct
Load balancers themselves can fail. High-availability configurations use redundant load balancers with health checks between them (e.g., VRRP) to ensure failover.
Mistake
Source IP hash provides the same distribution as round robin.
Correct
Source IP hash maps clients to servers based on their IP address. This can lead to uneven distribution if some IPs generate more traffic, and it does not balance based on server load.
Mistake
Least connections always sends requests to the server with the fewest active connections.
Correct
True for standard least connections, but weighted least connections also considers server capacity. Also, least connections may not be optimal if connections have very short lifetimes (e.g., HTTP/1.0).
Reveal each answer, then mark whether you got it right. Score 60%+ to unlock the next chapter.
The default algorithm varies by vendor, but Round Robin is very common. However, many modern load balancers default to Least Connections because it adapts to load. For example, HAProxy defaults to round robin, while F5 BIG-IP defaults to least connections (dynamic ratio). On the exam, if not specified, round robin is a safe default assumption, but always read the context.
Source IP hash uses a hash function on the client's IP address to determine which server handles the request. The same IP always hashes to the same server, ensuring all requests from that client go to the same server. However, if many clients share a single public IP (e.g., via NAT), they all map to the same server, causing imbalance.
Layer 4 load balancing operates at the transport layer (TCP/UDP). It makes routing decisions based on IP address and port, without inspecting packet contents. Layer 7 load balancing operates at the application layer (HTTP, HTTPS, etc.) and can inspect headers, cookies, and payload to make routing decisions (e.g., directing /images to one server and /api to another).
Common default health check interval is 5 seconds. The timeout for a response is often 2 seconds. A server is marked down after 3 consecutive failures (down threshold) and marked up after 2 consecutive successes (up threshold). These values are configurable.
Yes, load balancers can handle UDP traffic (e.g., DNS, VoIP). However, since UDP is connectionless, persistence mechanisms differ. Source IP hash is commonly used for UDP. Health checks for UDP may involve sending a specific query and expecting a response.
If all servers in a pool are marked down, the load balancer may return an error page (e.g., HTTP 503 Service Unavailable) or, if configured, redirect to a maintenance page. Some load balancers allow a fallback server or a static response.
When a client makes the first request, the load balancer selects a server and inserts a cookie (e.g., 'BIGipServer') into the HTTP response. The cookie contains an encoded server identifier. On subsequent requests, the load balancer reads the cookie and forwards the request to the same server. This method works even if the client's IP changes.
You've just covered Load Balancing Methods and Algorithms — now see how well it sticks with free N10-009 practice questions. Full explanations included, no account needed.
Done with this chapter?