BIND, dynamic DNS, failover A records
Contents |
Introduction
If you've got a multi-homed network with multiple IP addresses from different ISPs, but you aren't a big enough organization to convince your ISPs to build BGP routes to connect to each other at your network, you will probably find it really handy to have a single DNS record that will automatically choose the best way to get to your network from the outside world.
The problem
In this example, "BSDcompany" runs a a small office network (office.bsdcompany.com) and a server in a colocated network facility (coloserver.bsdcompany.com). Frequently, they need to access network resources inside the office from the internet. Since neither of the two ISPs available at BSDcompany's office are particularly reliable, BSDcompany has a cable modem from one of them, a DSL modem from the other, and a dual-WAN router. Both the cable and the DSL use dynamic IP addresses, and the company already has a server in the office doing dynamic DNS updates to cable-ip.office.bsdcompany.com and dsl-ip.office.bsdcompany.com.
BSDcompany's dual-WAN router provides load balancing and automatic failover redundancy for internet access from within the office. But BSDcompany wants similar redundancy and balancing from the outside coming in as well. So instead of randomly trying cable-ip.office.bsdcompany.com and dsl-ip.office.bsdcompany.com to see which (if either) is working at any particular time, they just want to be able to use a single name all the time and have it automatically take them to whichever ISP is up and/or faster at the moment.
The solution: ddns-failover.pl (another freebsdwiki.net original)
BSDcompany decides to set up a cron job on their colo server to check the status and latency of each of their office WAN IPs. That script will then automatically update a third A record, office.bsdcompany.com, with whichever is currently the quicker of the two office WANs to respond - and if both WANs are down, it will delete the record entirely until one or the other of them comes back up.
(Like the set-ddns.pl script in the previous dynamic DNS article, the variables ddns-failover.pl in UPPERCASE are things you should set to match your own situation, while the ones in lower or mixed case are generally things you shouldn't need to mess with.)
#!/usr/bin/perl # ddns-failover.pl # # Copyright (c) 05-20-2006, JRS System Solutions # All rights reserved under standard BSD license # details: http://www.opensource.org/licenses/bsd-license.php # # Check each of two public IPs for the same multi-homed host, # and set a dynamic DNS A record to point to the lower latency # of the two. If both routes are down, delete the hostname # entirely until one or both IPs come back up. $WANDNS1 = 'cable-ip.office.bsdcompany.com'; $WANDNS2 = 'dsl-ip.office.bsdcompany.com'; $HOSTNAME = 'office.bsdcompany.com'; $NAMESERVER = 'coloserver.bsdcompany.com'; $KEYFILE = 'Koffice.bsdcompany.com.+157+15661.private'; $KEYDIR = '/usr/home/ddns'; $TTL = '10'; @wan1 = split(/\n/,`/sbin/ping -qc 1 -t 1 $WANDNS1`); @wan2 = split(/\n/,`/sbin/ping -qc 1 -t 1 $WANDNS2`); $wan1[0] =~ /\((\d*?\.\d*?\.\d*?\.\d*?)\)/; $wan1_ip = $1; if ($wan1_ip == '') { $wan1_ip = 'NO HOST FOUND'; } $wan2[0] =~ /\((\d*?\.\d*?\.\d*?\.\d*?)\)/; $wan2_ip = $1; if ($wan2_ip == '') { $wan2_ip = 'NO HOST FOUND'; } $wan1[3] =~ /(\d*?) packets received/; $wan1_rcvd = $1; $wan2[3] =~ /(\d*?) packets received/; $wan2_rcvd = $1; $wan1[4] =~ /\/(\d*?\.\d*?)\//; $wan1_time = $1; $wan2[4] =~ /\/(\d*?\.\d*?)\//; $wan2_time = $1; if ($wan1_rcvd != 1 && $wan2_rcvd == 1) { print "WAN1 [$wan1_ip]: NO RESPONSE\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN2\n"; $dnsip=$wan2_ip; } elsif ($wan1_rcvd == 1 && $wan2_rcvd != 1) { print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: NO RESPONSE\nSET $HOSTNAME: WAN1\n"; $dnsip=$wan1_ip; } elsif ($wan1_rcvd != 1 && $wan2_rcvd !=1) { print "WAN1 [$wan1_ip]: NO RESPONSE\nWAN2 [$wan2_ip]: NO RESPONSE\nDELETE $HOSTNAME\n"; $dnsip='NO'; } elsif ($wan1_time <= $wan2_time) { print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN1\n"; $dnsip=$wan1_ip; } else { print "WAN1 [$wan1_ip]: $wan1_time" . "ms\nWAN2 [$wan2_ip]: $wan2_time" . "ms\nSET $HOSTNAME: WAN2\n"; $dnsip=$wan2_ip; } chdir ($KEYDIR); open (NSUPDATE, "| /usr/sbin/nsupdate -k $KEYFILE"); print NSUPDATE "server $NAMESERVER\n"; print NSUPDATE "update delete $HOSTNAME A\n"; if ($dnsip ne 'NO') { print NSUPDATE "update add $HOSTNAME $TTL A $dnsip\n"; } # print NSUPDATE "show\n"; print NSUPDATE "send\n"; close (NSUPDATE);
Setting up permissions
To minimize security risks, the gurus at BSDcompany create a new user named "ddns", put this script and the copies of the key files for the zone (which they already had, when they set up their dynamic DNS earlier) in the "ddns" user's home directory, and make sure to set the permissions on everything as restrictively as possible before setting up the cron job to actually run it.
coloserver# pw useradd ddns -s /sbin/nologin -d /usr/home/ddns coloserver# mkdir /home/ddns coloserver# cp /etc/namedb/zones/keys/Koffice.bsdcompany.com.+157+15661.private . coloserver# cp /etc/namedb/zones/keys/Koffice.bsdcompany.com.+157+15661.key . coloserver# chmod 400 Koffice.bsdcompany.com.+157+15661.* coloserver# chmod 500 ddns-failover.pl coloserver# ls -l -r-------- 1 ddns wheel 130 May 20 12:22 Kph34r.tehinterweb.net.+157+23266.key -r-------- 1 ddns wheel 145 May 20 13:17 Kph34r.tehinterweb.net.+157+23266.private -r-x------ 1 ddns wheel 3108 May 23 01:27 ddns-failover.pl coloserver# 'su ddns This account is currently not available.
Excellent: the ddns account is present but cannot be interactively logged into, the key files are readable (but not writeable or executable) only to it, and the script is executable (but not writeable) only to it. Now that the permissions are correct, it's time to do a test run - we'll run the script manually (using sudo to do so as the user ddns, just like the cron job will) before we set it up to run automatically.
Testing the script manually
coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl WAN1 [128.32.64.5]: 94.302ms WAN2 [144.69.42.18]: 85.341ms SET office.bsdcompany.com: WAN2 coloserver# ping -qc 1 office.bsdcompany.com PING office.bsdcompany.com (144.69.42.18): 56 data bytes --- office.bsdcompany.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 85.038/85.038/85.038/0.000 ms
Perfect! Now, the BSDcompany folks force an apparent fail condition on WAN2 to make sure it fails over properly:
coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private > update delete dsl-ip.office.bsdcompany.com > send > quit coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl WAN1 [128.32.64.5]: 98.213ms WAN2 [NO HOST FOUND]: NO RESPONSE SET office.bsdcompany.net: WAN1 coloserver# ping -qc 1 office.bsdcompany.com PING office.bsdcompany.com (128.32.64.5): 56 data bytes --- office.bsdcompany.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 97.188/97.188/97.188/0.000 ms
Good! Now they force WAN1 to apparently fail at the same time, to ensure the record is deleted entirely if both WANs go down (so you get an immediate failure response if both WANs are down, instead of having to wait for a pointless and possibly very lengthy network timeout first):
coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private > update delete cable-ip.office.bsdcompany.com > update delete dsl-ip.office.bsdcompany.com > send > quit coloserver# sudo -u ddns /usr/bin/perl /usr/home/ddns/ddns-failover.pl WAN1 [NO HOST FOUND]: NO RESPONSE WAN2 [NO HOST FOUND]: NO RESPONSE DELETE office.bsdcompany.net coloserver# ping -qc 1 office.bsdcompany.com ping: cannot resolve office.bsdcompany.com: Unknown host
Outstanding.
Installing and running the crontab
Now that we've thoroughly tested our script, we can set it up as a crontab to run once per minute, just like the crontab that runs set-ddns.pl on the server inside the office to update cable-ip and dsl-ip.
coloserver# crontab -u ddns -e
* * * * * /usr/bin/perl /usr/home/ddns/ddns-failover.pl > /dev/null
To be completely thorough, now we'll break the record one last time and let the cron job fix it behind us:
coloserver# nsupdate -k Koffice.bsdcompany.com.+157+15661.private > update delete office.bsdcompany.com A > send > quit coloserver# date Tue May 23 04:17:48 EDT 2006 coloserver# ping -qc 1 office.bsdcompany.com ping: cannot resolve office.bsdcompany.com: Unknown host coloserver# date Tue May 23 04:18:03 EDT 2006 coloserver# ping -qc 1 office.bsdcompany.com PING office.bsdcompany.com (144.69.42.18): 56 data bytes --- office.bsdcompany.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max/stddev = 86.338/86.338/86.338/0.000 ms
As soon as the minute ticked over, our crontab fired up and did what it's supposed to. So now that we've thoroughly tested both the script and the tab that runs it, we can forget about it and just use our failover A record without having to think about it anymore.