Improving DNS CDN Performance With edns-client-subnet

Several months ago I wrote about how third party DNS services often slow you down since a DNS query is only one part of the equation and many websites use DNS to help their CDN figure out what servers are closest (and fastest). A few proposals to fix this have floated around, one is finally making headway.

Google, Bitgravity, CDNetworks, DNS.com and Edgecast have deployed support for edns-client-subnet. The idea is pretty simple. It passes part of your IP address (only part as to keep it semi-anonymous) in the request. A server that supports this extension can use it to geotarget and find a CDN node closest to you. Previously the best that could be done was using the location of the DNS server, which in many cases could be far away.

Still missing is support from some heavyweights like Akamai, who is the largest CDN, Limelight Networks and Level3. This is a pretty solid proposal with minimal negative implications. They are only passing part of the origin IP address, so it wouldn’t be a privacy invasion. In theory any website you browse could already harvest the IP you are using, this is just making part of it accessible to a partner who is already serving data on their behalf.

DNS And CDN Performance Implications

I’ve seen various people complain about performance problems when using services like Google’s DNS or OpenDNS. The reason why people generally see these problems is because many large websites live behind Content Distribution Networks (known as a CDN) to serve at least part of their content, or even their entire site. You’re getting a sub-optimal response and your connection is slower than needed.

I’ve worked on large websites and setup some websites from DNS to HTML. As a result I’ve got some experience in this realm.

How DNS Works

To understand why this is, you first need to know how DNS works. When you connect to any site, your computer first makes a DNS query to get an IP address for the server(s) that will give the content you requested. For example, to connect to this blog, you’re computer asks your ISP’s DNS servers for robert.accettura.com and it gets an IP back. Your ISP’s DNS either has this information cached from a previous request, or it asks the websites DNS what IP to use, then relays the information back to you.

This looks something like this schematically:

[You] --DNS query--> [ISP DNS] --DNS query--> [Website DNS] --response--> [ISP DNS] --response--> [You]

Next your computer contacts that IP and requests the web page you wanted. The server then gives your computer the requested content. That looks something like this:

[You] --http request--> [Web Server] --response--> [You]

That’s how DNS works, and how you get a basic web page.

How a CDN Works

Now when you’re website gets large enough, you may have servers in multiple data centers around the world, or contract with a service provider who has these servers for you (most contract). This is called a content distribution network (CDN). Parts of, or your entire website may be hosted with a CDN. The idea is that if you put servers close to your users, they will get content faster.

Say the user is in New York, and the server is in Los Angeles. You’re connection may look something like this:

New York : 12.565 ms  10.199 ms
San Jose: 98.288 ms  96.759 ms  90.799 ms
Los Angeles: 88.498 ms  92.070 ms  90.940 ms

Now if the user is in New York and the server is in New York:

New York: 21.094 ms  20.573 ms  19.779 ms
New York: 19.294 ms  16.810 ms  24.608 ms

In both cases I’m paraphrasing a real traceroute for simplicity. As you can see, keeping the traffic in New York vs going across the country is faster since it reduces latency. That’s just in the US. Imagine someone in Europe or Asia. The difference can be large.

The way this happens is a company using a CDN generally sets up a CNAME entry in their DNS records to point to their CDN. Think of a CNAME as an alias that points to another DNS record. For example Facebook hosts their images and other static content on static.ak.facebook.com. static.ak.facebook.com is a CNAME to static.ak.facebook.com.edgesuite.net. (the period at the end is normal). We’ll use this as an example from here on out…

This makes your computer do an extra DNS query, which ironically slows things down! However in theory we make up the time and then some as illustrated earlier by using a closer server. When your computer sees the record is a CNAME it does another query to get an IP for the CNAME’s value. The end result is something like this:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

That last query is going to the CDN’s DNS instead of the website. The CDN gives an IP (sometimes multiple) that it feels is closest to whomever is requesting it (the DNS server). That’s the important takeaway from this crash course in DNS. The CDN only sees the DNS server of the requester, not the requester itself. It therefore gives an IP that it thinks is closest based on the DNS server making the query.

The use of a CNAME is why many large websites will 301 you to from foo.com to www.foo.com. foo.com must be an A record. To keep you behind the CDN they 301.

Now lets see it in action!

Here’s what a request from NJ for an IP for static.ak.facebook.com looks like:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute: Warning: static.ak.facebook.com has multiple addresses; using 64.208.248.243
traceroute to a749.g.akamai.net (64.208.248.243), 64 hops max, 52 byte packets
 1  192.168.x.x (192.168.x.x)  1.339 ms  1.103 ms  0.975 ms
 2  c-xxx-xxx-xxx-xxx.hsd1.nj.comcast.net (xxx.xxx.xxx.xxx)  25.431 ms  19.178 ms  22.067 ms
 3  xe-2-1-0-0-sur01.ebrunswick.nj.panjde.comcast.net (68.87.214.185)  9.962 ms  8.674 ms  10.060 ms
 4  xe-3-1-2-0-ar03.plainfield.nj.panjde.comcast.net (68.85.62.49)  10.208 ms  8.809 ms  10.566 ms
 5  68.86.95.177 (68.86.95.177)  13.796 ms
    68.86.95.173 (68.86.95.173)  12.361 ms  10.774 ms
 6  tengigabitethernet1-4.ar5.nyc1.gblx.net (64.208.222.57)  18.711 ms  18.620 ms  17.337 ms
 7  64.208.248.243 (64.208.248.243)  55.652 ms  24.835 ms  17.277 ms

That’s only about 50 miles away and as low as 17ms latency. Not bad!

Now here’s the same query done from Texas:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 72.247.246.16
a749.g.akamai.net has address 72.247.246.19

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute to static.ak.facebook.com (63.97.123.59), 30 hops max, 40 byte packets
 1  xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx)  2.737 ms  2.944 ms  3.188 ms
 2  98.129.84.172 (98.129.84.172)  0.423 ms  0.446 ms  0.489 ms
 3  98.129.84.177 (98.129.84.177)  0.429 ms  0.453 ms  0.461 ms
 4  dal-edge-16.inet.qwest.net (205.171.62.41)  1.350 ms  1.346 ms  1.378 ms
 5  * * *
 6  63.146.27.126 (63.146.27.126)  47.582 ms  47.557 ms  47.504 ms
 7  0.ae1.XL4.DFW7.ALTER.NET (152.63.96.86)  1.640 ms  1.730 ms  1.725 ms
 8  TenGigE0-5-0-0.GW4.DFW13.ALTER.NET (152.63.97.197)  2.129 ms  1.976 ms TenGigE0-5-1-0.GW4.DFW13.ALTER.NET (152.63.101.62)  1.783 ms
 9   (63.97.123.59)  1.450 ms  1.414 ms  1.615 ms

The response this time is from the same city and a mere 1.6 ms away!

For comparison www.facebook.com does not appear to be on a CDN, Facebook serves this content directly off of their servers (which are in a few data centers). From NJ the ping time averages 101.576 ms, and from Texas 47.884 ms. That’s a huge difference.

Since www.facebook.com hosts pages specifically outputted for the user, putting them through a CDN would be pointless since the CDN would have to go to Facebooks servers for every request. For things like images and stylesheets a CDN can cache them at each node.

Wrapping It Up

Now the reason why using a DNS service like Google’s DNS or OpenDNS will slow you down is that while a DNS query may be quick, you may no longer be using the closest servers a CDN can give you. You generally only make a few DNS queries per pageview, but may make a dozen or so requests for different assets that compose a page. In cases where a website is behind a CDN, I’m not sure that using even a faster DNS service will ever payoff. For smaller sites, it obviously would since this variable is removed from the equation.

There are a few proposals floating around out there to resolve this limitation in DNS, but at this point there’s nothing in place.

Whitehouse.gov Analysis

A few notes on the new whitehouse.gov website as I did for the campaign sites after about 5 minutes of sniffing around:

  • Running Microsoft-IIS 6.0 and ASP.NET 2.0.50727. The Bush administration ran Apache on what I think was some sort of Unix. Data is gzip’d.
  • Whitehouse.gov is using Akamai as a CDN and for DNS service.
  • Using jQuery 1.2.6 (someone should let them know 1.3 is out). Also using several plugins including jQuery UI, jcarousel, Thickbox. Also using swfobject.
  • Pages tentatively validate as XHTML 1.0 Transitional! I’m shocked by this. I’ve checked several pages all with the same result.
  • Using WebTrends for analytics. Bush Administration also did.
  • IE Conditional Stylesheets and a print stylesheet.
  • RSS feeds are actually Atom feeds.
  • The website is setting two cookies that I can see WT_FPC and ASP.NET_SessionId which expire at the end of the session which is not prohibited in federal government as per OMB Guidance for Implementing the Privacy Provisions of the E-Government Act of 2002 (using Google Cache for that link since I can’t find it anywhere else, our government should really keep those in a more permanent location).

I should note that this is quite different in architecture than the Obama campaign site which ran PWS/PHP, no notable JS library, feed, and Google Analytics.

Update [1/20/2009 @ 9:00 PM EST]:

Redefining Broadband

The FCC for years has been considering any connection greater than 200kbps to be broadband. For the past several years that’s been pretty misleading. In addition, they only collect downstream, not upstream. They also consider an entire zip code to have broadband if only 1 home can get it. That’s not very accurate. This makes the broadband situation in the US look better than it really is.

The definition of broadband in the US is now being redefined as 768kbps. They will now collect upstream data, and use census-track data. This is a major win since it will more accurately show how many people really do have broadband, and more importantly how many do not.

I personally disagree on the number and think it should be at least 2Mbps, but it’s a win regardless.

The pacific rim annihilates the United States when it comes to broadband. According to Akamai’s State Of The Internet for Q1 2008 high broadband (greater than 5Mbps) is where we really start to show our deficiencies. Here’s a look at broadband which they define as simply greater than 2Mbps:

Rank Country % >2Mbps Q4 07 Change
Global 55% -2.0%
1 South Korea 93% -1.5%
2 Belgium 90% +1.5%
3 Switzerland 89% +0.5%
4 Hong Kong 87% -1.5%
5 Japan 87% +1.0%
6 Norway 83% -2.3%
7 Tunisia 82% +29%
8 Slovakia 81% +0.5%
9 Netherlands 78% -2.6%
10 Bahamas 74% -3.0%
24 United States 62% -2.8%

Pretty pathetic considering our last Vice President invented the Internet 😉 . We are the largest in terms of sq miles, but when you consider the US population density, the bulk of our land is very sparsely populated. 80.8% of the US population lives in an urban setting [Warning: PDF].

US Population Density

Japan by comparison has 66.0% of it’s population in an urban setting. Belgium has a surprising 91.5% which may account for it’s #2 position. Switzerland has 44.4% yet makes 3rd place threatening Belgium’s position.

I’m far from the first one to complain about the poor state of broadband. BusinessWeek and CNet both have relatively good discussions about the topic.

The future of media is clearly moving online as people demand to consume it on their schedule as they desire. Take a look at some of the statistics and it’s clearly a large industry. I suspect the lack of broadband infrastructure will be a real problem in the next several years as the rest of the world becomes very easy to distribute media to, and the US still faces challenges.

Solution? Highly debatable, but if so many other countries can do something about it, I suspect it’s achievable here in the US as well. I suspect that the taxes made from companies that do business on the internet from ecommerce to advertising would make this a decent investment for the US government to at least partially back. The more places companies make money, the more places the government does. That may be necessary as not all markets are profitable enough for telco’s to bother with. There have been various attempts to jumpstart this effort, but none to date have been successful.

It’s not only about just having access, it’s also the cost. As BusinessWeek points out in the article above, broadband in the US is not cheap.

Perhaps wireless will finally allow for competition and lower prices, at least that’s what everyone is hoping for. The question is if it will happen, if the technology will be there (wireless is generally high latency), and if it will be affordable for the common man.

I suspect in the next 4 years this will become and even bigger topic of discussion as some of the top ranking countries start to reach the point of saturation.

Secrets In Websites II

This post is a follow up to the first Secrets In Websites. For those who don’t remember the first time, I point out odd, interesting, funny things in other websites’ code. Yes it takes some time to put a post like this together, that’s why it’s just about a year since the last time. Enough with the intro, read on for the code.

Continue reading

FoxTorrent

I said back in 2004 that Firefox needs built in support for BitTorrent. My idea was it would be integrated into the download manager so that it was “just another protocol” and would be transparent to a typical user. I still stand by that.

Fast forward to 2007: FoxTorrent is by RedSwoosh (now owned by Akamai).

I’d personally love to see something like this ship built in. It’s a great feature. BitTorrent is a great protocol for distributing large downloads without having to buy expensive infrastructure. Akamai’s interest is proof of that.

FoxTorrent has a blog if you want to keep an eye on it. FoxTorrent is MIT licensed as well. It seems like a very interesting product. I’ll have to dig into this and look at it a bit closer.

[Hat tip: TechCrunch]

Akamai taken out by bot network

Doesn’t this creep you out? Akamai, an extremely robust network, designed for those who need intensive server-side power, taken out by a bot network.

14,000 servers in 1,100 networks in 65+ countries.

Just makes you wonder how vulnerable the internet really is. Yea, it’s a web, and not based on a central hub. But it obviously still has problems.

On another note, what a wonderful NOCC.