All you have to do is walk into any data center and you'll see them: server load balancers. These devices are integral parts of today's Web sites, and cornerstones to their scalability and reliability. Perhaps the most challenging aspect of SLB (server load balancing) for network administrators is figuring out the best way to implement the device in a given network. This aspect of SLB is probably the least understood of any issue, and the plethora of SLB products and vendors with their own diverse installation methods only compounds the issue. Most network administrators understand the basic premise of SLB, which is to make many servers appear as one to an end user by distributing the traffic load to multiple servers. What is not as well understood is how load balancers fit into a given network architecture. There are many different ways of implementing SLB in a network, and usually a given product is capable of several implementation methods.

There is method to the madness, however, since most SLB implementations can be simplified to fall under two categories: Bridge-path and route-path. Of those two, I think route-path is the better way to go. Before I go into why I prefer route-path, let's take a look at why they are different.

A load balancer works by taking traffic on a VIP (Virtual IP), sending that traffic to an available server, and then sending that server traffic back over the Internet to its destination. The critical part here is that, in most cases, traffic must traverse the load balancer on the way back out to the Internet.

Figure1-1: VIP traffic and the load balancer
Figure1-1: VIP traffic and the load balancer

In this diagram, the traffic from an Internet user hits the VIP on the load balancer in step 1. In step 2, the traffic is diverted to an available Web server. The Web server responds and sends the traffic back to the user in step 3, passing through the load balancer on the way out. In step 4, the traffic leaves the load balancer on its final journey to the user.

In most cases, traffic must traverse the load balancer on its way back out to the Internet. This third step in Figure 1-1 is the fundamental difference between route-path and bridge-path, and is a major factor in how load balancers are integrated into a network. With the bridge-path method, the load balancer is in the Layer 2 path of traffic bound for the Internet, acting as a bridge between two separate networks. With route-path, the load balancer is in the Layer 3 path of outbound server traffic and is the server's default route.

Route-path has several advantages over bridge-path, which makes route-path much more attractive. It's more flexible, easier to integrate into a given network, and offers a number of different configurations. Bridge-path, on the other hand, has several limitations, which restrict the available configurations in a given network as well as cause issues with redundancy.

One of the biggest limitations with bridge-path is the basic nature of Layer 2 traffic, which is that you cannot have more than one path to a given target. If there is more than one path, one of two things will most likely happen: A bridging loop will be created, flooding the network with continuously amplified Layer 2 frames (if you've ever seen this happen, you know it's a hoot); or STP (Spanning-Tree Protocol) will shut down one of those paths, which could also shut down other portions of a network. In a scenario where there are two Layer 2 devices for redundancy, one of the Layer 2 devices must be inactive to prevent a bridging loop.

Let's take the following example of a bridge-path load balancing implementation (Figure 1-2). In this example, we have a pair of load balancers that sit in front of the servers. Any traffic must traverse the load balancers on the way in and out. The load balancers act as a bridge between the public and server networks, and everything is on the same IP subnet (208.20.20.0/24). One unit is active while the other unit sits in standby mode (not forwarding Layer 2 packet).

Figure 1-2: Bridge-path flat-based SLB implementation 
example
Figure 1-2: Bridge-path flat-based SLB implementation example
(Click on image for a larger view)

In this redundant bridge-path load balancing scenario, it is a requirement that only one unit be active in forwarding Layer 2 packets. Because of this limitation with the bridge-path method, only one pair of load balancers can be employed in a given network configuration. Two sets of load balancers would have two active load balancers in a group of four, which would still create multiple Layer 2 paths and thus a bridging loop.

Redundancy speed is also an issue with the bridge-path method. Depending on the vendor and overall network infrastructure, STP is usually part of the overall Layer 2 redundancy, and STP is not known for its speed. Fail-over time for STP, depending on implementation, can take well over ten seconds. That is a veritable eternity where an Internet site is concerned. However, it should be mentioned that many SLB products that employ the bridge-path method also have some other way of doing redundancy, which is usually somewhat quicker.

With the route-path method, traffic flow is controlled very easily by setting the default route on a server. In this way, the load balancers do not need to be in the direct Layer 2 path, as in bridge-path. Let's take the following example of a route-path SLB implementation (Figure 1-3), which gives the same SLB functionality as the bridge-path example.

Figure1-3: Route-path flat-based SLB implementation 
example
Figure1-3: Route-path flat-based SLB implementation example
(Click on image for a larger view)

The load balancers are hung off of the Layer 2 switches, not the Layer 2 path of the servers. Instead, the default route of the servers is set to the floating IP address that exists on the active load balancer. This ensures that traffic passes through the load balancers on the way out. As with the bridge-path example, everything is on the same subnet, although now we are dealing with only one VLAN, whereas with bridge-path we were dealing with two networking segments (public network and server network).

Redundancy is much simpler and easier to implement on the Layer 3 level. VRRP (Virtual Router Redundancy Protocol), or a similar protocol is the usual method of redundancy. VRRP utilizes floating IPs between two units, with one unit having the floating IP active while the other is in standby, in case the active unit fails. Fail-over can typically occur in five seconds or less.

With the route-path method, you have the choice of using either one subnet for the VIPs and real servers (called flat-based SLB), as used in the two previous examples, or two subnets with the VIPs on a public network and the real servers on a separate, usually private subnet (called NAT-based SLB). With the bridge-path method, the VIPs and real servers must be on the same subnet. Since the load balancer acts as a router in the route-path method, you can perform the router function of NAT from one subnet to another. Figure 1-4 is an example of a NAT-based SLB implementation:

Figure 1-4: NAT-based SLB Implementation
Figure 1-4: NAT-based SLB Implementation
(Click on image for a larger view)

In this scenario, the load balancers are connected to two separate VLANs using separate links, one being the public network and the other being the private network, using a total of two ports (even on a switch-based load balancer). The public network uses public IP address space while the private server network uses private nonrouted RFC 1918 address space. The VIPs exist in the public network with the servers exclusively on the private network. The load balancer performs a NAT on the inbound traffic, and NATs it back on the way out. There are floating IPs between the load balancers, as VIPs accepting traffic on the public network, or as the default route of the servers on the private network. This NAT-based scenario has several advantages security-wise, including the ability to make the load balancers' firewalls by employing packet filtering.

There is a third way to handle traffic on the outbound called DSR (Direct Server Return). With DSR, traffic actually doesn't travel through the load balancer on the way out. Through some networking trickery involving loopback interface configuration and a process known as MAT (MAC Address Translation), traffic is sent to the end user from the server already rewritten with the source address of the VIP on the load balancer. With this step already handled by the server, traffic can go unabated to the Internet. Since traffic for most Internet sites is outbound rather than inbound, this represents a significant savings of resources for the load balancer.

Normally, if a site's traffic ratio is one packet in for every ten packets out, the load balancer handles all eleven packets. With DSR, the outbound traffic never hits the load balancer, and only handles one out of every eleven packets. The configuration is much like route-path, except the default route of the servers is not the load balancer; rather it's the IP address of the router servicing that subnet (and the load balancer itself). Configuring DSR requires some expertise in Layer 2 and Layer 3 dynamics and is much more complicated, so it's generally a good idea to use it only when there is a specific need. DSR does not usually work with a bridge-path scenario.

In general, switch-based products on the market tend to be more bridge-path oriented, while the PC-based appliance load balancers tend to support only route-path. However, most of the switch-based vendors also support the route-path method, and one PC product that I know of does bridge-path exclusively. Of course, these features vary depending on the vendor.

Given the easier redundancy, more flexible configurations, and added functionality, it's easy to see why I prefer the route-path method. As with any network setup, your specific requirements will dictate how load balancing is implemented. There may be cases where bridging-path might be more appropriate than routing-path, but I think the majority of the time route-path would be the optimal configuration. As with any network setup, your mileage will vary.