Proxies and Load Balancers
These notes cover proxies and load balancers: what they are, how they differ, and why they matter for system design. As a developer, you are unlikely to implement any of these yourself — the goal is to understand the concepts well enough to reason about architecture and communicate clearly in system design contexts.
The good news is that we already know more about these than we might think. A CDN, a VPN, a load balancer — these are all proxies operating at different layers.
Contents
- Proxies
- Forward Proxy
- Reverse Proxy
- Load Balancers
- Load Balancing Algorithms
- Layer 4 vs. Layer 7 Load Balancers
- Load Balancer Availability
- In Practice
- Summary
1. Proxies
A proxy is a server that sits between a client and a destination server, intercepting and forwarding traffic on behalf of one side or the other. There are two types, and the distinction comes down to which side the proxy is acting for.
- Forward proxy — acts on behalf of the client. The destination server sees the proxy, not the client.
- Reverse proxy — acts on behalf of the server. The client sees the proxy, not the destination servers behind it.
The rest of this chapter unpacks what that means in practice.
2. Forward Proxy
A forward proxy is what most people mean when they say "proxy." It is a middle server that sits between the client and the destination, forwarding the client's requests outward.
The key property is that the forward proxy hides the client from the destination server. The destination sees the proxy's IP address, not the originating client's. This has a few practical uses:
- Anonymity and identity protection. The client's real IP address is not exposed to the destination.
- Bypassing restrictions. Because the proxy appears as the origin of traffic, clients can use proxies located in different regions to bypass geo-blocking or route around network-level firewalls.
- Content filtering. The proxy can inspect and block outbound requests based on rules. Corporate networks and parental controls commonly use forward proxies to prevent access to certain content before the request ever reaches the destination.
- Centralised traffic visibility. All outbound traffic from a network flows through the proxy, which means it can be logged and audited centrally.
A VPN (Virtual Private Network) is a familiar example of a forward proxy. When you connect to a VPN, your traffic is routed through a server in another location. The destination sees that server's IP, not yours — which is how VPNs let you appear to be browsing from a different country.
📝VPNs and Forward Proxies Are Not Identical
A VPN and a forward proxy are similar in effect but different in implementation. A VPN encrypts all traffic at the network level and routes it through a tunnel. A forward proxy typically operates at the application layer and may not encrypt traffic by default. Both hide your IP from the destination; how they do it differs.
3. Reverse Proxy
A reverse proxy sits in front of one or more backend servers, accepting requests from clients and forwarding them inward. Where a forward proxy hides the client, a reverse proxy hides the servers.
The client has no visibility into what is behind the reverse proxy. It makes a request to a single address and receives a response. Whether that response came from one server or one of fifty is invisible to it. This abstraction makes interacting with complex backend infrastructure feel like talking to a single server.
Reverse proxies serve several purposes beyond just forwarding traffic:
- Load distribution. Traffic is spread across multiple backend servers to prevent any one from being overwhelmed.
- SSL termination. The reverse proxy handles HTTPS encryption and decryption, so backend servers can communicate over plain HTTP internally.
- Caching. Responses can be cached at the proxy layer so that repeated requests for the same content do not need to reach backend servers at all.
- Security. Backend server addresses and internal network structure are hidden from clients.
Two common examples of reverse proxies that are likely already familiar:
A CDN (Content Delivery Network) is a reverse proxy that caches and serves static assets from servers geographically close to the requesting client, reducing latency. The client never talks directly to the origin server for cached content.
A load balancer is also a reverse proxy, and is important enough for system design that it gets its own section.
4. Load Balancers
A load balancer is a reverse proxy whose primary job is distributing incoming traffic across a pool of backend servers. All of those backend servers run the same code and can handle the same types of requests — the load balancer simply decides which one handles any given request.
The benefits of distributing traffic this way are:
- Performance. A single server has a ceiling on how many concurrent requests it can handle. Spreading traffic across multiple servers raises that ceiling significantly.
- Reliability. If one server crashes or becomes unresponsive, the load balancer routes traffic to the remaining healthy servers. The client sees no interruption.
- Scalability. Adding capacity is as simple as adding more servers to the pool and registering them with the load balancer.
Load balancers also perform health checks: periodic pings to each backend server to confirm it is alive and able to handle requests. A server that fails health checks is temporarily removed from the pool until it recovers.
💡Statelessness Makes Load Balancing Possible
Load balancing depends on every backend server being interchangeable. If a server stores session state in memory, then subsequent requests from the same client must always land on the same server — which defeats the purpose of having multiple servers. This is why stateless design (covered in the REST and serverless chapters) is not just a good practice but a prerequisite for horizontal scaling.
5. Load Balancing Algorithms
The load balancer needs a strategy to decide which server handles each request. Several common algorithms exist, each with different trade-offs.
Round robin
The simplest approach. Requests are distributed across servers in a fixed cycle: server 1, server 2, server 3, server 1, server 2, server 3, and so on.
The problem with plain round robin is that it treats all servers as equal. If one server has less memory or a slower CPU than the others, it will still receive the same number of requests — and may start falling behind.
Weighted round robin addresses this by assigning each server a weight proportional to its capacity. A server with weight 3 receives three times as many requests as a server with weight 1. This allows heterogeneous pools (servers with different specs) to be balanced more fairly.
Least connections
Instead of cycling mechanically, the load balancer routes each incoming request to whichever server currently has the fewest active connections.
This adapts naturally to situations where requests take different amounts of time to process. If one server is handling several long-running requests while another has finished all of its work, the next request goes to the idle server. Round robin would not account for this — least connections does.
Location-based
Routes requests to the backend server geographically closest to the client. This is particularly relevant for global applications where network latency between continents is noticeable. A client in Europe is directed to a European server; a client in Asia to an Asian one.
CDNs use a form of this: they serve cached content from the edge node nearest the client, rather than always hitting a central origin server.
IP hash
A hash of the client's IP address determines which server handles the request. Because the same IP always produces the same hash, the same client is consistently routed to the same server.
This is useful in cases where some server-side affinity is needed: for example, if a server maintains a local cache that is warm for a particular client's data. The hashing mechanism behind this is worth understanding in more depth, particularly what happens when servers are added or removed from the pool and how to minimise disruption when that occurs. This is covered in Consistent Hashing.
6. Layer 4 vs. Layer 7 Load Balancers
Load balancers operate at different layers of the network stack, and the layer they use determines how much they can see about the traffic they are handling.
Layer 4 (transport layer)
A layer 4 load balancer operates at the TCP/UDP level. It sees the source and destination IP addresses and ports, and nothing more. It cannot inspect the content of the request.
Because it only needs to look at IP and port information, a layer 4 load balancer is fast and efficient. The trade-off is that it has no awareness of what the request is actually asking for. It cannot route /api/users to one server and /api/notes to another — it has no idea which path was requested.
Layer 7 (application layer)
A layer 7 load balancer operates at the HTTP level. It can read the full request: the URL path, headers, cookies, and body.
This content-awareness enables much more sophisticated routing. Requests for /api/notes can go to the notes service, requests for /api/auth to the auth service, and so on. This is how most modern API gateways and reverse proxies work.
The cost is complexity and overhead. A layer 7 load balancer must decrypt HTTPS traffic, parse the HTTP request, make a routing decision, and establish a new connection to the backend server — rather than simply rewriting an IP address and forwarding a packet. This makes it slower than a layer 4 load balancer, though in practice the difference is rarely the bottleneck in a well-designed system.
| Dimension | Layer 4 | Layer 7 |
|---|---|---|
| Operates at | Transport layer (TCP/UDP) | Application layer (HTTP/HTTPS) |
| Sees | IP address and port only | Full request: path, headers, body |
| Routing basis | IP and port | URL path, headers, content |
| Speed | Faster; minimal inspection | Slower; full request parsing |
| Flexibility | Low; cannot route by content | High; full content-aware routing |
| SSL termination | No | Yes |
7. Load Balancer Availability
A load balancer improves the availability of the servers behind it. But what happens if the load balancer itself goes down? It becomes the single point of failure for the entire system.
The straightforward solution is to run multiple load balancers in parallel. If one fails, traffic continues to flow through the others.
There are two common approaches to distributing traffic across multiple load balancers:
DNS round robin. The domain name resolves to multiple IP addresses, each belonging to a different load balancer. DNS clients cycle through them. This is simple to set up but has a known limitation: DNS responses are cached, so if a load balancer goes down, some clients may continue sending traffic to its IP until the cache expires.
Shared virtual IP. Multiple load balancers share a single virtual IP address. Only one is active at a time; if it fails, another takes over and claims the same IP. Clients see no change in address, so there is no DNS caching problem. This approach is harder to implement and typically requires specialised networking configuration.
📝Load Balancers Rarely Fail in Practice
In production systems, load balancers are purpose-built for high throughput and are rarely the component that fails. Most cloud-managed load balancers (AWS ALB, Google Cloud Load Balancing) handle their own redundancy internally and are effectively treated as a reliable primitive. The single-point-of-failure concern is most relevant when running your own load balancer infrastructure.
8. In Practice
Implementing a load balancer from scratch is something almost no development team does. The complexity involved in correctly handling health checks, session persistence, SSL termination, and failover is significant. The common choices are:
- Nginx — a widely used open-source web server that can also function as a reverse proxy and load balancer. Highly configurable, runs on your own infrastructure.
- HAProxy — a dedicated open-source load balancer, known for performance and reliability in high-traffic environments.
- Cloud-managed options — AWS Elastic Load Balancing, Google Cloud Load Balancing, and Azure Load Balancer all offer managed load balancers that handle redundancy, health checks, and scaling without operational overhead. For most production systems, these are the default choice.
The right approach for most projects is to pick a cloud-managed load balancer and treat it as a reliable building block, rather than trying to manage the infrastructure yourself.
Summary
| Concept | Key Takeaway |
|---|---|
| Forward proxy | Sits between client and destination; hides the client. Used for anonymity, geo-bypassing, and content filtering. VPNs are a common example. |
| Reverse proxy | Sits in front of backend servers; hides the servers. The client sees one address regardless of how many servers exist behind it. |
| CDN | A reverse proxy that caches and serves content from servers geographically close to the client. |
| Load balancer | A reverse proxy that distributes traffic across multiple backend servers to improve performance, reliability, and scalability. |
| Round robin | Cycles through servers in order. Weighted variant accounts for differing server capacities. |
| Least connections | Routes to whichever server currently has the fewest active connections; adapts to variable request duration. |
| IP hash | Consistently routes the same client IP to the same server. The underlying mechanism is covered in Consistent Hashing. |
| Layer 4 | Routes by IP and port only; fast but cannot inspect request content. |
| Layer 7 | Routes by full HTTP request content; flexible and content-aware but carries more overhead. |
| Single point of failure | A single load balancer is itself a failure risk; mitigated with multiple load balancers via DNS round robin or a shared virtual IP. |