GoDaddy DNS Outage

Via Wired:

Following a day-long Domain Name Service server outage, web hosting provider GoDaddy is letting its competitor, VeriSign, host its DNS servers.

Part of me wants to point out that GoDaddy’s relationship with VeriSign’s is not quite that of a competitor. GoDaddy’s primary business is domain registration. VeriSign sold Network Solutions back in 2003. VeriSign used to sell SSL certs, that’s now owned by Symantec. They however still sell hosting and DNS services which competes with GoDaddy, however I don’t think they are really competing as they seem to be targeting different markets. VeriSign is the authoritative registry for .com and .net, making them essential to the entire operation of domains. GoDaddy being the largest registrar suggests they’ve had a relationship for a long time.

What this demonstrates is that GoDaddy totally dropped the ball and realized they weren’t prepared for today’s events This was a very long outage, even with moving to VeriSign.

Improving DNS CDN Performance With edns-client-subnet

Several months ago I wrote about how third party DNS services often slow you down since a DNS query is only one part of the equation and many websites use DNS to help their CDN figure out what servers are closest (and fastest). A few proposals to fix this have floated around, one is finally making headway.

Google, Bitgravity, CDNetworks, DNS.com and Edgecast have deployed support for edns-client-subnet. The idea is pretty simple. It passes part of your IP address (only part as to keep it semi-anonymous) in the request. A server that supports this extension can use it to geotarget and find a CDN node closest to you. Previously the best that could be done was using the location of the DNS server, which in many cases could be far away.

Still missing is support from some heavyweights like Akamai, who is the largest CDN, Limelight Networks and Level3. This is a pretty solid proposal with minimal negative implications. They are only passing part of the origin IP address, so it wouldn’t be a privacy invasion. In theory any website you browse could already harvest the IP you are using, this is just making part of it accessible to a partner who is already serving data on their behalf.

DNS And CDN Performance Implications

I’ve seen various people complain about performance problems when using services like Google’s DNS or OpenDNS. The reason why people generally see these problems is because many large websites live behind Content Distribution Networks (known as a CDN) to serve at least part of their content, or even their entire site. You’re getting a sub-optimal response and your connection is slower than needed.

I’ve worked on large websites and setup some websites from DNS to HTML. As a result I’ve got some experience in this realm.

How DNS Works

To understand why this is, you first need to know how DNS works. When you connect to any site, your computer first makes a DNS query to get an IP address for the server(s) that will give the content you requested. For example, to connect to this blog, you’re computer asks your ISP’s DNS servers for robert.accettura.com and it gets an IP back. Your ISP’s DNS either has this information cached from a previous request, or it asks the websites DNS what IP to use, then relays the information back to you.

This looks something like this schematically:

[You] --DNS query--> [ISP DNS] --DNS query--> [Website DNS] --response--> [ISP DNS] --response--> [You]

Next your computer contacts that IP and requests the web page you wanted. The server then gives your computer the requested content. That looks something like this:

[You] --http request--> [Web Server] --response--> [You]

That’s how DNS works, and how you get a basic web page.

How a CDN Works

Now when you’re website gets large enough, you may have servers in multiple data centers around the world, or contract with a service provider who has these servers for you (most contract). This is called a content distribution network (CDN). Parts of, or your entire website may be hosted with a CDN. The idea is that if you put servers close to your users, they will get content faster.

Say the user is in New York, and the server is in Los Angeles. You’re connection may look something like this:

New York : 12.565 ms  10.199 ms
San Jose: 98.288 ms  96.759 ms  90.799 ms
Los Angeles: 88.498 ms  92.070 ms  90.940 ms

Now if the user is in New York and the server is in New York:

New York: 21.094 ms  20.573 ms  19.779 ms
New York: 19.294 ms  16.810 ms  24.608 ms

In both cases I’m paraphrasing a real traceroute for simplicity. As you can see, keeping the traffic in New York vs going across the country is faster since it reduces latency. That’s just in the US. Imagine someone in Europe or Asia. The difference can be large.

The way this happens is a company using a CDN generally sets up a CNAME entry in their DNS records to point to their CDN. Think of a CNAME as an alias that points to another DNS record. For example Facebook hosts their images and other static content on static.ak.facebook.com. static.ak.facebook.com is a CNAME to static.ak.facebook.com.edgesuite.net. (the period at the end is normal). We’ll use this as an example from here on out…

This makes your computer do an extra DNS query, which ironically slows things down! However in theory we make up the time and then some as illustrated earlier by using a closer server. When your computer sees the record is a CNAME it does another query to get an IP for the CNAME’s value. The end result is something like this:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

That last query is going to the CDN’s DNS instead of the website. The CDN gives an IP (sometimes multiple) that it feels is closest to whomever is requesting it (the DNS server). That’s the important takeaway from this crash course in DNS. The CDN only sees the DNS server of the requester, not the requester itself. It therefore gives an IP that it thinks is closest based on the DNS server making the query.

The use of a CNAME is why many large websites will 301 you to from foo.com to www.foo.com. foo.com must be an A record. To keep you behind the CDN they 301.

Now lets see it in action!

Here’s what a request from NJ for an IP for static.ak.facebook.com looks like:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute: Warning: static.ak.facebook.com has multiple addresses; using 64.208.248.243
traceroute to a749.g.akamai.net (64.208.248.243), 64 hops max, 52 byte packets
 1  192.168.x.x (192.168.x.x)  1.339 ms  1.103 ms  0.975 ms
 2  c-xxx-xxx-xxx-xxx.hsd1.nj.comcast.net (xxx.xxx.xxx.xxx)  25.431 ms  19.178 ms  22.067 ms
 3  xe-2-1-0-0-sur01.ebrunswick.nj.panjde.comcast.net (68.87.214.185)  9.962 ms  8.674 ms  10.060 ms
 4  xe-3-1-2-0-ar03.plainfield.nj.panjde.comcast.net (68.85.62.49)  10.208 ms  8.809 ms  10.566 ms
 5  68.86.95.177 (68.86.95.177)  13.796 ms
    68.86.95.173 (68.86.95.173)  12.361 ms  10.774 ms
 6  tengigabitethernet1-4.ar5.nyc1.gblx.net (64.208.222.57)  18.711 ms  18.620 ms  17.337 ms
 7  64.208.248.243 (64.208.248.243)  55.652 ms  24.835 ms  17.277 ms

That’s only about 50 miles away and as low as 17ms latency. Not bad!

Now here’s the same query done from Texas:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 72.247.246.16
a749.g.akamai.net has address 72.247.246.19

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute to static.ak.facebook.com (63.97.123.59), 30 hops max, 40 byte packets
 1  xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx)  2.737 ms  2.944 ms  3.188 ms
 2  98.129.84.172 (98.129.84.172)  0.423 ms  0.446 ms  0.489 ms
 3  98.129.84.177 (98.129.84.177)  0.429 ms  0.453 ms  0.461 ms
 4  dal-edge-16.inet.qwest.net (205.171.62.41)  1.350 ms  1.346 ms  1.378 ms
 5  * * *
 6  63.146.27.126 (63.146.27.126)  47.582 ms  47.557 ms  47.504 ms
 7  0.ae1.XL4.DFW7.ALTER.NET (152.63.96.86)  1.640 ms  1.730 ms  1.725 ms
 8  TenGigE0-5-0-0.GW4.DFW13.ALTER.NET (152.63.97.197)  2.129 ms  1.976 ms TenGigE0-5-1-0.GW4.DFW13.ALTER.NET (152.63.101.62)  1.783 ms
 9   (63.97.123.59)  1.450 ms  1.414 ms  1.615 ms

The response this time is from the same city and a mere 1.6 ms away!

For comparison www.facebook.com does not appear to be on a CDN, Facebook serves this content directly off of their servers (which are in a few data centers). From NJ the ping time averages 101.576 ms, and from Texas 47.884 ms. That’s a huge difference.

Since www.facebook.com hosts pages specifically outputted for the user, putting them through a CDN would be pointless since the CDN would have to go to Facebooks servers for every request. For things like images and stylesheets a CDN can cache them at each node.

Wrapping It Up

Now the reason why using a DNS service like Google’s DNS or OpenDNS will slow you down is that while a DNS query may be quick, you may no longer be using the closest servers a CDN can give you. You generally only make a few DNS queries per pageview, but may make a dozen or so requests for different assets that compose a page. In cases where a website is behind a CDN, I’m not sure that using even a faster DNS service will ever payoff. For smaller sites, it obviously would since this variable is removed from the equation.

There are a few proposals floating around out there to resolve this limitation in DNS, but at this point there’s nothing in place.

DNSSEC Root Key

Bruce Schneier pointed out that DNSSEC root key has been divided among seven people for security:

Part of ICANN’s security scheme is the Domain Name System Security, a security protocol that ensures Web sites are registered and “signed” (this is the security measure built into the Web that ensures when you go to a URL you arrive at a real site and not an identical pirate site). Most major servers are a part of DNSSEC, as it’s known, and during a major international attack, the system might sever connections between important servers to contain the damage.

A minimum of five of the seven keyholders – one each from Britain, the U.S., Burkina Faso, Trinidad and Tobago, Canada, China, and the Czech Republic – would have to converge at a U.S. base with their keys to restart the system and connect eveything once again.

Based on this key signing video it looks like they are using smart cards and an AEP Keyper HSM for this critical task. Schneier suspects it implements the Shamir’s Secret Sharing algorithm.

Considering how much our economy and our lives rely on the Internet these days, DNS is becoming a more and more critical part of our society. This is a very big event. No precaution is too great to ensure security of such critical infrastructure.

Google DNS Privacy Policy

John Gruber among others note that Google DNS service is not tied to Google Accounts. That’s not just wording in their privacy statement, it’s technically impossible for them to do otherwise, at least with reasonable accuracy.

Your computer is associated with a Google account via a cookie given to you when you login. Cookies are sent back to Google’s servers as HTTP headers whenever you fetch something from the host that set the cookie (every request, even images). They can only be sent to that domain, nobody else.

DNS doesn’t operate over HTTP, and therefore can’t tell what Google Account you’re using.

Google could however use your IP address you used to login to your Google Account and associate it with your DNS activity, but that would make the statisticians at Google cringe. So many homes and businesses have multiple computers behind a NAT router. Google DNS is unable to distinguish between them. Even one computer can have multiple users.

Before someone jumps up and says “MAC address”, the answer is: NO. To keep it simple a MAC address is part of the “Data Link Layer” of the OSI model (Layer 2) and is used to address adjacent devices. Your MAC address is only transmitted until the first hop which would be the first router on your way to Google. Each time your data makes it to the next device on its way to Google the previous MAC header is stripped off and a new one is added. By the time your bits get to Google that packet of data has only the last hop’s MAC address on it. Many people confuse Layers 2 and 3.

Google Public DNS Analysis

Google’s new Public DNS is interesting. They want to lower DNS latency in hopes of speeding up the web.

Awesome IP Address

This is the most interesting thing to me. I view IP addresses similar to the way Steve Wozniak views phone numbers, though I don’t collect them like he does phone numbers.

[Querying whois.arin.net]
[whois.arin.net]
Level 3 Communications, Inc. LVLT-ORG-8-8 (NET-8-0-0-0-1) 
                                  8.0.0.0 - 8.255.255.255
Google Incorporated LVLT-GOOGL-1-8-8-4 (NET-8-8-4-0-1) 
                                  8.8.4.0 - 8.8.4.255

# ARIN WHOIS database, last updated 2009-12-02 20:00
# Enter ? for additional hints on searching ARIN's WHOIS database.

Looks like Google is working with Level 3 (also their partner for Google Voice I hear) for the purpose of having an easy to remember IP. From what I can tell it’s anycasted to a Google data center.

For what it’s worth, 6.6.6.6 is owned by the US Army. Make of that what you will.

NXDOMAIN

First thought is Google would hijack NXDOMAIN for the purpose of showing ads, like many ISP’s and third party DNS providers. Instead they explicitly state:

If you issue a query for a domain name that does not exist, Google Public DNS always returns an NXDOMAIN record, as per the DNS protocol standards. The browser should show this response as a DNS error. If, instead, you receive any response other than an error message (for example, you are redirected to another page), this could be the result of the following:

  • A client-side application such as a browser plug-in is displaying an alternate page for a non-existent domain.
  • Some ISPs may intercept and replace all NXDOMAIN responses with responses that lead to their own servers. If you are concerned that your ISP is intercepting Google Public DNS requests or responses, you should contact your ISP.

Good. Nobody should ever hijack NXDOMAIN. DNS should be handled per spec.

Performance Benefits

Google documented what they did to speed things up. Some of it anyway. Good news is they will still be obeying TTL it seems. My paraphrasing:

  • Infrastructure – Tons of hardware/network capacity. No shocker.
  • Shared caching in the cluster – Pretty self explanatory.
  • Prefetching name resolutions – Google is using their web search index and DNS server logs to figure out who to prefetch.
  • Anycast routing – Again obvious. They do note however that this can have negative consequences:

    Note, however, that because nameservers geolocate according to the resolver’s IP address rather than the user’s, Google Public DNS has the same limitations as other open DNS services: that is, the server to which a user is referred might be farther away than one to which a local DNS provider would have referred. This could cause a slower browsing experience for certain sites.

Google also discusses the security practices to mitigate some common security issues.

Privacy

Google says after 24-48 hours they erase any IP information in their privacy policy. Assuming you trust Google that may be better than what your ISP is doing though your ISP could still log by monitoring DNS traffic over their network. As far as I’m aware there are no US laws governing data retention, though proposed several times.

I am curious how this will be treated in Europe who does have some data retention laws for ISP’s. Does providing DNS, traditionally an ISP activity make you an ISP? Or do you need to handle transit as well? Does an ISP need to track DNS queries of someone using a 3rd party DNS? Remember recording IP’s alone is not the same thanks to virtual hosting. Many websites can exist on one IP.

OpenDNS and others may have flown under the radar being smaller companies, but Google will attract more attention. I suspect it’s only a matter of time before someone raises this question.

Would I use it?

I haven’t seen any DNS related problems personally. I’ve seen degraded routing from time to time from my ISP. Especially in those cases, my nearby ISP provided DNS would be quicker than Google. I don’t really like how nameservers may geolocate me further away, but that’s not a deal killer. I don’t plan on switching since I don’t see much of a benefit at this time.

BitTorrent For HTTP Failover

There is a proposal circulating around the web to create a X-Torrent HTTP header for the purpose of pointing to a torrent file as an alternative way to download a file from an overloaded server. I’ve been an advocate of implementing BitTorrent in browsers in particular Firefox since at least 2004 according to this blog, and like the idea in principal but don’t like the implementation proposed.

The way the proposal would work is a server would send the X-Torrent HTTP header and if the browser chose to use the BitTorrent it would do that rather than leach the servers bandwidth. This however fails if the server is already overloaded.

Unnecessary Header

This is also a little unnecessary since browsers automatically send an Accept-Encoding Requests header which could contain support for torrents removing the need for a server to send this by default. Regardless the system still fails if the server is overloaded.

Doesn’t Failover

A nicer way would be to also utilize DNS which is surprisingly good at scaling for these types of tasks. It’s already used for similar things like DNSBL and SPF.

Example

Assume my browser supports the BitTorrent protocol and I visit the following url for a download:

http://dl.robert.accettura.com/pub/myfile.tar.gz

My request would look something like this:

Get: /pub/myfile.tar.gz
Host: dl.robert.accettura.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5
Accept: */*
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate,torrent
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://robert.accettura.com/download

The servers response would look something like this:

Date: Sun, 18 Jan 2009 00:25:54 GMT
Server: Apache
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-bittorrent

The content would be the actual torrent. The browser would handle as appropriate by opening a helper application or handling it internally. If I didn’t have torrent in my Accept-Encoding header, I would have been served via HTTP like we are all accustomed.

Now what happens if the server is not responding? A fallback to the DNS level could be done.

First take the GET and generate a SHA1 checksum for the GET, in my example that would be:

438296e855494825557824b691a09d06a86a21f1

Now to generate a DNS Query in the format [hash]._torrent.[server]:

438296e855494825557824b691a09d06a86a21f1._torrent.dl.robert.accettura.com

The response would look something like a Base64 encoded .torrent file broken up and served as TOR or TXT records. Should the string not fit in one record (I think the limit is 512 bytes) the response could be broken up into multiple records and concatenated by the client to reassemble.

Odds of a collision with existing DNS space is limited due to the use of a SHA1 hash and the _torrent subdomain. It coexists peacefully.

Downside

The downside here is that if your server fails your DNS is going to take an extra query from any client capable of doing this. There is slight latency in this process.

Upside/Conclusion

The upside is that DNS scaling has come a long way and is rarely an issue for popular sites and web hosts. DNS can (and often is) cached by ISP’s resulting in an automatic edge CDN thanks to ISP’s. ISP’s can also mitigate traffic on their networks by caching on their side (something I also suggested in 2004).

BitTorrent may be used for illegal content, but so is HTTP. I think costs for ISP’s and websites could be significantly cut by making BitTorrent more transparent as a data transfer protocol.

DNS Strangeness Followup

A few days ago I mentioned I was having some DNS issues. I’m pretty sure they are resolved as the last few days I haven’t seen anything odd.

It seems the primary nameserver did not bump the SOA when it updated. As a result one of the other DNS servers was out of sync. Why only one? I doubt I’ll ever discover why.

Anyway, it seems to be fixed. If anyone notices an issue, let me know.

Me.com DNS Queries

I decided to take another look at me.com, the eve before we all expect a launch at WWDC2008. It’s still using marksmonitor.com for DNS.

TTL

The TTL is still set to 28800 (8 hours). I think this will drop before 2:00 AM PST (5:00 AM EST) if they are planning to offer email service to users so that it can propegate. Also note last update was June 4 2008.

me.com.                 85673   IN      SOA     ns1.markmonitor.com. eddingsk.apple.com. 2008060411 28800 7200 604800 86400

Akamaized

The www CNAME record is already pointing to Akamai (which Apple uses quite a bit). There is currently no A record for me.com (The me.com A record can’t point to Akamai since their service is delivered via a CNAME).

www.me.com.             591     IN      CNAME   www.me.com.edgesuite.net.
www.me.com.edgesuite.net. 18591 IN      CNAME   a1917.g.akamai.net.
a1917.g.akamai.net.     20      IN      A       8.15.32.42
a1917.g.akamai.net.     20      IN      A       8.15.32.9

No mx records yet

There’s still no mx records yet. This is interesting since it’s already pointing at Akamai’s EdgeSuite, yet it doesn’t have mx records. Why only halfway setup DNS?

mobileme.com

There is no DNS info to be found for mobileme.com. This makes me think that Apple may not intend to launch this product tomorrow. Perhaps they just don’t want it to land in the hands of squatters? Backup domain (in case they couldn’t have me.com)?

mac.com

The TTL for mac.com is set to 1880 (30 minutes). That was originally very surprising until I noticed the serial was May 16 2008. I suspect Apple always keeps it low so that they can make changes on the fly (given their past email problems, makes sense).

mac.com.                4106    IN      SOA     nserver.apple.com. hostmaster.apple.com. 2008051601 1800 600 1209600 7200

Conclusions

There’s nothing very damning here. If anything I’d conclude mobileme.com is possibly/likely not a product, just trying to keep the squatters at bay. I think more may be revealed early tomorrow morning as Apple starts to prep for launch. Unless the security team at Apple decides to wait until post-announcement to start up the site. I guess we’ll see soon enough.