Difference between round-robin DNS and Route53 multivalue answers
Until now, if you want to load-balance traffic between multiple servers, you can either:
- use DNS round-robin to distribute traffic to a pool of IPs (but if one server is down you might get some traffic sent to it if clients are not clever enough to try all IPs returned by the DNS server)
- use a load-balancer appliance (ELB, HAProxy, etc.) to handle the server pool and removal of unhealthy servers automatically, based on health checks (but it can be expensive to run, and/or another thing to monitor).
But Amazon just announced a few days ago they were adding a new type of routing policy for DNS records: multivalue answers, with support for health checks.
Basically this means that you can create multiple records for the same domain name, and associate each record with a different server IP and a health check that returns whether this server is up or not. Based on that, Route53 will reply to DNS queries for the domain name with just the IPs that are healthy, meaning that contrary to classic round-robin DNS, client programs will not see unhealthy IPs in the DNS response.
First, let's create two EC2 instances with a simple nginx server running on it, which returns the hostname of the server it is currently running on. For this test you can use a t2.nano instance, and setup the user-data to automatically install nginx and the default index.html page with this script:
#!/bin/bash set -e apt-get update -qq && apt-get install -y nginx hostname > /var/www/html/index.nginx-debian.html
Next, you will need to create 2 Route53 health checks (one for each server):
For now the health check status is
Unknown, but a few seconds after they should both switch to
You now have 2 servers, with the corresponding health checks. Let's create the final piece with the 2 DNS records, using the new multivalue answers policy:
After a little while, both IPs should show in the
$ dig poor-man-lb.barebuild.com poor-man-lb.barebuild.com. 60INA18.104.22.168 poor-man-lb.barebuild.com. 60INA52INA22.214.171.124
Now, if you stop one of the servers, after about 1 min (or lower if you set a TTL < 60s, which is probably a good idea if you're using this policy) the unhealthy host should disappear:
$ dig poor-man-lb.barebuild.com poor-man-lb.barebuild.com. 60sINA126.96.36.199
Is it useful?
This is definitely no replacement for a proper load-balancer, but if you have a large number of backend servers and/or dumb clients that only try the first (or first N) IPs returned, this can prove useful as you don't run the risk of exposing unhealthy servers in your DNS replies.
Although nowadays, in a round-robin DNS setup, all browsers, cURL, etc. are able to try the other IPs until one connects, but again you never know what old stack/program your clients may be using to connect to your service, so using this new Route53 policy puts you on the safe there. Also, keeping unhealthy IPs in DNS replies is probably not that great performance-wise, as the client will have to potentially try to connect to multiple IPs before being able to send a request.