Load balancer is a subsystem which acts as a layer b/w your cluster of servers and the clients who are trying to access them. As its name indicates, it is used to balance the load on our application servers as shown in the diagram below:
As soon as a request comes to a load balancer, it routes that particular request to a particular server behind the load balancer. For example, in the diagram shown below, one of the request initiated by the client is mapped to the appplication server A behind the load balancer, which then processes that request and sends back a response to the client.
The application server to which the load balancer routes a particular request is determined by the routing algorithms used by the load balancer.
When we horizontally scale our service and add more application servers, we need a mechanism to determine which application server to hit for a particular request. Load balancer is the subsystem which will determine which application server to hit.
Scalability: Adding a load balancer will increase the scalability of the system. A single instance of load balancer will be able to handle multiple requests concurrently and will not overwhelm a single server.
Availability: Even if one of the application server goes down, your application will still keep on working as your load balancer will start routing those requests to some other application server.
Increased Speed: We can enable caching in load balancer which will allow us to serve static resources and resources which don’t change often directly from the cache without even hitting the application servers.