Netflix Open Connect

Netflix is trying to reduce it’s dependency on CDN’s by peering directly with ISP’s and with a new hardware appliance ISP’s can host on their own network to offload traffic. The peering option is pretty strait forward. The appliance however is interesting. Netflix is actually quite transparent about what they are doing, so I thought I’d dig in and take a little look since they are sharing:

Hardware

Netflix says right up front they were influenced by Backblaze, and their appliance is actually quite similar in many respects. The difference is that Netflix does need a bit more CPU and Network IO and a little less storage. That balance is pretty achievable. The appliance must be a tad on the heavy side as this is a pretty heavily packed server.

Essentially the hardware is a Supermicro mATX board and a bunch of SATA hard drives in a custom 4U enclosure. There are 2 16 port LSI SAS controllers for 32 drives and 4 drives presumably running directly off the motherboard. Hitachi Deskstar or Seagate Barracuda drives. Nothing fancy here. An interesting tidbit is there are 2 x 512 GB “flash storage” (presumably SSD) for logs, OS, popular content. I’d assume those two are running in RAID 0 as one volume. They are managing the spinning disks in software RAID so they can handle failures.

Software

FreeBSD is the OS of choice. Not sure if this software RAID they are doing is something they cooked up or something already out there. Another interesting note is they are using nginx for a web server and are using http for moving content. Huge win for nginx and says a lot for it’s abilities as a web server. It also sounds like Netflix is a customer of NGINX, Inc.

The idea of an appliance on the ISP end isn’t new. CDN’s generally live close, not in the ISP’s network. On the TV side Weather Channel has done this for ages via the little known WeatherSTAR appliances (pic). They sit at the headend and get weather from TWC. They then output local weather reports as a video for the cable provider to insert. The WeatherSTAR appliance like the Netflix appliance is essentially 0 maintenance. It just lives locally and serves it’s master remotely.

It’s nice that they’ve been as open as they have about what they are building. They also have an engineering blog worth keeping an eye on.

Improving DNS CDN Performance With edns-client-subnet

Several months ago I wrote about how third party DNS services often slow you down since a DNS query is only one part of the equation and many websites use DNS to help their CDN figure out what servers are closest (and fastest). A few proposals to fix this have floated around, one is finally making headway.

Google, Bitgravity, CDNetworks, DNS.com and Edgecast have deployed support for edns-client-subnet. The idea is pretty simple. It passes part of your IP address (only part as to keep it semi-anonymous) in the request. A server that supports this extension can use it to geotarget and find a CDN node closest to you. Previously the best that could be done was using the location of the DNS server, which in many cases could be far away.

Still missing is support from some heavyweights like Akamai, who is the largest CDN, Limelight Networks and Level3. This is a pretty solid proposal with minimal negative implications. They are only passing part of the origin IP address, so it wouldn’t be a privacy invasion. In theory any website you browse could already harvest the IP you are using, this is just making part of it accessible to a partner who is already serving data on their behalf.

DNS And CDN Performance Implications

I’ve seen various people complain about performance problems when using services like Google’s DNS or OpenDNS. The reason why people generally see these problems is because many large websites live behind Content Distribution Networks (known as a CDN) to serve at least part of their content, or even their entire site. You’re getting a sub-optimal response and your connection is slower than needed.

I’ve worked on large websites and setup some websites from DNS to HTML. As a result I’ve got some experience in this realm.

How DNS Works

To understand why this is, you first need to know how DNS works. When you connect to any site, your computer first makes a DNS query to get an IP address for the server(s) that will give the content you requested. For example, to connect to this blog, you’re computer asks your ISP’s DNS servers for robert.accettura.com and it gets an IP back. Your ISP’s DNS either has this information cached from a previous request, or it asks the websites DNS what IP to use, then relays the information back to you.

This looks something like this schematically:

[You] --DNS query--> [ISP DNS] --DNS query--> [Website DNS] --response--> [ISP DNS] --response--> [You]

Next your computer contacts that IP and requests the web page you wanted. The server then gives your computer the requested content. That looks something like this:

[You] --http request--> [Web Server] --response--> [You]

That’s how DNS works, and how you get a basic web page.

How a CDN Works

Now when you’re website gets large enough, you may have servers in multiple data centers around the world, or contract with a service provider who has these servers for you (most contract). This is called a content distribution network (CDN). Parts of, or your entire website may be hosted with a CDN. The idea is that if you put servers close to your users, they will get content faster.

Say the user is in New York, and the server is in Los Angeles. You’re connection may look something like this:

New York : 12.565 ms  10.199 ms
San Jose: 98.288 ms  96.759 ms  90.799 ms
Los Angeles: 88.498 ms  92.070 ms  90.940 ms

Now if the user is in New York and the server is in New York:

New York: 21.094 ms  20.573 ms  19.779 ms
New York: 19.294 ms  16.810 ms  24.608 ms

In both cases I’m paraphrasing a real traceroute for simplicity. As you can see, keeping the traffic in New York vs going across the country is faster since it reduces latency. That’s just in the US. Imagine someone in Europe or Asia. The difference can be large.

The way this happens is a company using a CDN generally sets up a CNAME entry in their DNS records to point to their CDN. Think of a CNAME as an alias that points to another DNS record. For example Facebook hosts their images and other static content on static.ak.facebook.com. static.ak.facebook.com is a CNAME to static.ak.facebook.com.edgesuite.net. (the period at the end is normal). We’ll use this as an example from here on out…

This makes your computer do an extra DNS query, which ironically slows things down! However in theory we make up the time and then some as illustrated earlier by using a closer server. When your computer sees the record is a CNAME it does another query to get an IP for the CNAME’s value. The end result is something like this:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

That last query is going to the CDN’s DNS instead of the website. The CDN gives an IP (sometimes multiple) that it feels is closest to whomever is requesting it (the DNS server). That’s the important takeaway from this crash course in DNS. The CDN only sees the DNS server of the requester, not the requester itself. It therefore gives an IP that it thinks is closest based on the DNS server making the query.

The use of a CNAME is why many large websites will 301 you to from foo.com to www.foo.com. foo.com must be an A record. To keep you behind the CDN they 301.

Now lets see it in action!

Here’s what a request from NJ for an IP for static.ak.facebook.com looks like:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 64.208.248.243
a749.g.akamai.net has address 64.208.248.208

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute: Warning: static.ak.facebook.com has multiple addresses; using 64.208.248.243
traceroute to a749.g.akamai.net (64.208.248.243), 64 hops max, 52 byte packets
 1  192.168.x.x (192.168.x.x)  1.339 ms  1.103 ms  0.975 ms
 2  c-xxx-xxx-xxx-xxx.hsd1.nj.comcast.net (xxx.xxx.xxx.xxx)  25.431 ms  19.178 ms  22.067 ms
 3  xe-2-1-0-0-sur01.ebrunswick.nj.panjde.comcast.net (68.87.214.185)  9.962 ms  8.674 ms  10.060 ms
 4  xe-3-1-2-0-ar03.plainfield.nj.panjde.comcast.net (68.85.62.49)  10.208 ms  8.809 ms  10.566 ms
 5  68.86.95.177 (68.86.95.177)  13.796 ms
    68.86.95.173 (68.86.95.173)  12.361 ms  10.774 ms
 6  tengigabitethernet1-4.ar5.nyc1.gblx.net (64.208.222.57)  18.711 ms  18.620 ms  17.337 ms
 7  64.208.248.243 (64.208.248.243)  55.652 ms  24.835 ms  17.277 ms

That’s only about 50 miles away and as low as 17ms latency. Not bad!

Now here’s the same query done from Texas:

$ host static.ak.facebook.com
static.ak.facebook.com is an alias for static.ak.facebook.com.edgesuite.net.
static.ak.facebook.com.edgesuite.net is an alias for a749.g.akamai.net.
a749.g.akamai.net has address 72.247.246.16
a749.g.akamai.net has address 72.247.246.19

Now lets trace the connection to one of these responses:

$ traceroute static.ak.facebook.com
traceroute to static.ak.facebook.com (63.97.123.59), 30 hops max, 40 byte packets
 1  xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx)  2.737 ms  2.944 ms  3.188 ms
 2  98.129.84.172 (98.129.84.172)  0.423 ms  0.446 ms  0.489 ms
 3  98.129.84.177 (98.129.84.177)  0.429 ms  0.453 ms  0.461 ms
 4  dal-edge-16.inet.qwest.net (205.171.62.41)  1.350 ms  1.346 ms  1.378 ms
 5  * * *
 6  63.146.27.126 (63.146.27.126)  47.582 ms  47.557 ms  47.504 ms
 7  0.ae1.XL4.DFW7.ALTER.NET (152.63.96.86)  1.640 ms  1.730 ms  1.725 ms
 8  TenGigE0-5-0-0.GW4.DFW13.ALTER.NET (152.63.97.197)  2.129 ms  1.976 ms TenGigE0-5-1-0.GW4.DFW13.ALTER.NET (152.63.101.62)  1.783 ms
 9   (63.97.123.59)  1.450 ms  1.414 ms  1.615 ms

The response this time is from the same city and a mere 1.6 ms away!

For comparison www.facebook.com does not appear to be on a CDN, Facebook serves this content directly off of their servers (which are in a few data centers). From NJ the ping time averages 101.576 ms, and from Texas 47.884 ms. That’s a huge difference.

Since www.facebook.com hosts pages specifically outputted for the user, putting them through a CDN would be pointless since the CDN would have to go to Facebooks servers for every request. For things like images and stylesheets a CDN can cache them at each node.

Wrapping It Up

Now the reason why using a DNS service like Google’s DNS or OpenDNS will slow you down is that while a DNS query may be quick, you may no longer be using the closest servers a CDN can give you. You generally only make a few DNS queries per pageview, but may make a dozen or so requests for different assets that compose a page. In cases where a website is behind a CDN, I’m not sure that using even a faster DNS service will ever payoff. For smaller sites, it obviously would since this variable is removed from the equation.

There are a few proposals floating around out there to resolve this limitation in DNS, but at this point there’s nothing in place.

Amazon S3 Outage

The buzz around the web today was the outage of Amazon’s S3. It shows what websites are “doing it right”, and who fails. This is a great follow up to my “Reliability On The Grid” post the other day.

Amazon S3 is cloud based computing. Essentially when you send them a file using their REST or SOAP interface Amazon stores it on multiple nodes in their infrastructure. This provides redundancy and security (in case a data center catches fire for example). Because of this design it’s often though that cloud based computing is invincible to problems. This is hardly the fact. Just like any large system, it’s complicated and full of hazards. It takes only a small software glitch, or an unaccounted for issue to cause the entire thing to grind to a halt. More complexity = more things that can fail.

Amazon S3 is popular because it’s cheap and easy to scale. It’s pay-per-use based on bandwidth, disk storage, and requests. Because that allows for websites to grow without having to make a large infrastructure investment, it’s popular for “Web 2.0” companies trying to keep their budgets tight. Notably sites like Twitter, WordPress.com, SmugMug and Amazon.com themselves all use Amazon S3 to host things like images.

Many sites, notably Twitter, and SmugMug didn’t have a good day today. WordPress.com and Amazon.com operated like normal. The obvious reason for this is WordPress.com and Amazon.com are much better in terms of infrastructure and design.

WordPress.com uses S3, but proxies that with Varnish. There’s a brief description here, and a more detailed breakdown here. According to Barry Abrahamson, WordPress.com does 1500 image requests per second across and 80-100 are served through S3. They have (slower) back up’s in house for when S3 is down and can failover if S3 has a problem. This means they can leverage S3 to their advantage, but aren’t down because of S3. Using Varnish allows them to keep the S3 bill down by using their own bandwidth (likely cheaper since they are a large site and can get better rates on bandwidth). This also and lets them have this have a good level of redundancy. Awesome job.

Amazon.com uses S3 themselves. If you look at images on the site, they are actually served from g-ecx.images-amazon.com. Which is actually:

g-ecx.images-amazon.com. 38     IN      CNAME   ant.mii.instacontent.net.

instacontent.net is actually part of Mirror Image, a CDN. This is essentially outsourcing what WordPress.com is doing in terms of caching. It’s similar to Akamai’s services. A CDN’s biggest advantage is lowering latency by using servers closer to the customer, which are generally going to feel faster. The other benefit is that they cache content for when the origin is having problems. Because Amazon has a layer on top of S3, they have an added level of protection and remained up and images loaded.

Twitter serves most images such as avatars right off of S3. This means when S3 went down, there were thousands of dead images on their pages. No caching, not even a CNAME in place. Image hosting is the least of their concerns. Keeping the service up and running is their #1 concern right now. The service was still usable, just ugly. Many users take advantage of third party clients anyway.

Using a CDN or having the infrastructure in house is obviously more expensive (it makes S3 more of a luxury than a cost savings measure), but it means your not depending on one third party for your uptime.