Archive for the ‘Internet’ Category

Amazon S3 Outage

The buzz around the web today was the outage of Amazon’s S3. It shows what websites are “doing it right”, and who fails. This is a great follow up to my “Reliability On The Grid” post the other day.

Amazon S3 is cloud based computing. Essentially when you send them a file using their REST or SOAP interface Amazon stores it on multiple nodes in their infrastructure. This provides redundancy and security (in case a data center catches fire for example). Because of this design it’s often though that cloud based computing is invincible to problems. This is hardly the fact. Just like any large system, it’s complicated and full of hazards. It takes only a small software glitch, or an unaccounted for issue to cause the entire thing to grind to a halt. More complexity = more things that can fail.

Amazon S3 is popular because it’s cheap and easy to scale. It’s pay-per-use based on bandwidth, disk storage, and requests. Because that allows for websites to grow without having to make a large infrastructure investment, it’s popular for “Web 2.0″ companies trying to keep their budgets tight. Notably sites like Twitter, WordPress.com, SmugMug and Amazon.com themselves all use Amazon S3 to host things like images.

Many sites, notably Twitter, and SmugMug didn’t have a good day today. WordPress.com and Amazon.com operated like normal. The obvious reason for this is WordPress.com and Amazon.com are much better in terms of infrastructure and design.

WordPress.com uses S3, but proxies that with Varnish. There’s a brief description here, and a more detailed breakdown here. According to Barry Abrahamson, WordPress.com does 1500 image requests per second across and 80-100 are served through S3. They have (slower) back up’s in house for when S3 is down and can failover if S3 has a problem. This means they can leverage S3 to their advantage, but aren’t down because of S3. Using Varnish allows them to keep the S3 bill down by using their own bandwidth (likely cheaper since they are a large site and can get better rates on bandwidth). This also and lets them have this have a good level of redundancy. Awesome job.

Amazon.com uses S3 themselves. If you look at images on the site, they are actually served from g-ecx.images-amazon.com. Which is actually:

g-ecx.images-amazon.com. 38     IN      CNAME   ant.mii.instacontent.net.

instacontent.net is actually part of Mirror Image, a CDN. This is essentially outsourcing what WordPress.com is doing in terms of caching. It’s similar to Akamai’s services. A CDN’s biggest advantage is lowering latency by using servers closer to the customer, which are generally going to feel faster. The other benefit is that they cache content for when the origin is having problems. Because Amazon has a layer on top of S3, they have an added level of protection and remained up and images loaded.

Twitter serves most images such as avatars right off of S3. This means when S3 went down, there were thousands of dead images on their pages. No caching, not even a CNAME in place. Image hosting is the least of their concerns. Keeping the service up and running is their #1 concern right now. The service was still usable, just ugly. Many users take advantage of third party clients anyway.

Using a CDN or having the infrastructure in house is obviously more expensive (it makes S3 more of a luxury than a cost savings measure), but it means your not depending on one third party for your uptime.

Reliability On The Grid

There’s been a lot of discussion lately (in particular NYTimes, Data Center Knowledge) regarding both reliability of web applications which users are becoming more and more reliant on, as well as the security of such applications. It’s a pretty interesting topic considering there are so many things that ultimately have an impact on these two metrics. I call them metrics since that’s what they really are.

(more…)

8 Million Downloads In 24 Hours

It was a ton of fun to watch, absolutely addictive. 83 terabytes of data served just for downloads over 24 hours. There’s still a ton of people to update as the auto-update functionality has yet to be triggered. You can now see the scale of what’s involved. John Lilly’s got some great statistics on what just happened.

According to Arbor Networks, yesterday’s U.S. Open played at Torrey Pines (featuring Tiger Woods and a bunch of guys pretty much nobody cares about) generated so much traffic some ISP’s thought it was a DDoS attack. There was a huge spike on TCP/1935. Ironically this was about the same time Firefox 3 was unleashed. I wonder if that had any effect. Maybe next time, rather than a “world record” it should simply be “reek havoc on your ISP”.

Redefining Broadband

The FCC for years has been considering any connection greater than 200kbps to be broadband. For the past several years that’s been pretty misleading. In addition, they only collect downstream, not upstream. They also consider an entire zip code to have broadband if only 1 home can get it. That’s not very accurate. This makes the broadband situation in the US look better than it really is.

The definition of broadband in the US is now being redefined as 768kbps. They will now collect upstream data, and use census-track data. This is a major win since it will more accurately show how many people really do have broadband, and more importantly how many do not.

I personally disagree on the number and think it should be at least 2Mbps, but it’s a win regardless.

The pacific rim annihilates the United States when it comes to broadband. According to Akamai’s State Of The Internet for Q1 2008 high broadband (greater than 5Mbps) is where we really start to show our deficiencies. Here’s a look at broadband which they define as simply greater than 2Mbps:

Rank Country % >2Mbps Q4 07 Change
Global 55% -2.0%
1 South Korea 93% -1.5%
2 Belgium 90% +1.5%
3 Switzerland 89% +0.5%
4 Hong Kong 87% -1.5%
5 Japan 87% +1.0%
6 Norway 83% -2.3%
7 Tunisia 82% +29%
8 Slovakia 81% +0.5%
9 Netherlands 78% -2.6%
10 Bahamas 74% -3.0%
24 United States 62% -2.8%

Pretty pathetic considering our last Vice President invented the Internet ;-) . We are the largest in terms of sq miles, but when you consider the US population density, the bulk of our land is very sparsely populated. 80.8% of the US population lives in an urban setting [Warning: PDF].

US Population Density

Japan by comparison has 66.0% of it’s population in an urban setting. Belgium has a surprising 91.5% which may account for it’s #2 position. Switzerland has 44.4% yet makes 3rd place threatening Belgium’s position.

I’m far from the first one to complain about the poor state of broadband. BusinessWeek and CNet both have relatively good discussions about the topic.

The future of media is clearly moving online as people demand to consume it on their schedule as they desire. Take a look at some of the statistics and it’s clearly a large industry. I suspect the lack of broadband infrastructure will be a real problem in the next several years as the rest of the world becomes very easy to distribute media to, and the US still faces challenges.

Solution? Highly debatable, but if so many other countries can do something about it, I suspect it’s achievable here in the US as well. I suspect that the taxes made from companies that do business on the internet from ecommerce to advertising would make this a decent investment for the US government to at least partially back. The more places companies make money, the more places the government does. That may be necessary as not all markets are profitable enough for telco’s to bother with. There have been various attempts to jumpstart this effort, but none to date have been successful.

It’s not only about just having access, it’s also the cost. As BusinessWeek points out in the article above, broadband in the US is not cheap.

Perhaps wireless will finally allow for competition and lower prices, at least that’s what everyone is hoping for. The question is if it will happen, if the technology will be there (wireless is generally high latency), and if it will be affordable for the common man.

I suspect in the next 4 years this will become and even bigger topic of discussion as some of the top ranking countries start to reach the point of saturation.

Slow Site

Last Friday (May 2), the data center where this site lives suffered a power fluctuation due to some tornado activity in the area. The actual outage (if there was even one) seemed to have been in the 5 minute ballpark based on various monitors. Apparently this somehow resulted in a routing problem resulting in some lag and packet loss for some (including myself). Possibly a router that didn’t persist as well as one would hope. This is being investigated.

As a result, if this site (and it’s feed) seems slower than normal, that’s the reason.

Over Logging

Linksys On Southpark

Southpark last week featured an internet outage as a plot. Pretty clever though I was disappointed to not see 1 reference to the series of tubes. I’m not sure if the reference to Linksys (Cisco) being responsible for the Internet being down was a complement or an insult. Though those Linksys boxes are infamous with just dying like that until you power cycle. Any other brand seems to have figured out how to not have that issue. Linux firmware on a Linksys also seems to remedy it. References to “Independence Day” and “Close Encounters of the Third Kind” were clever.

You can watch it online by clicking on the screen grab.

RSS Feed Count

I decided to count how many RSS feeds I subscribe to. Scoble better watch out.

RSS Feed Count

To be fair, I monitor a fair number just to see that they update, or to search. I don’t actually “read” them, or even look at them regularly. Others I quickly skim. Then the last group I actually look at pretty closely.

Phone 2.0: DNS Dialing Anyone?

I’m going to make a giant proposal to the web. Identifiers suck. Email, IM, Phone, etc. Most people have more than one of each. Lets fix that. Step by step.

(more…)

W3C On DTD Perversions

According to the W3C Systeam’s blog, there’s a lot of poorly designed software out there. It’s pretty rare that something has a legitimate need to pull down a DTD in order to work. They should never be requesting it on a very frequent basis. It’s a very cachable asset. The post includes some pretty impressive stats too:

..up to 130 million requests per day, with periods of sustained bandwidth usage of 350Mbps, for resources that haven’t changed in years.

They also make a few requests which really all developers should follow. Here’s my summary:

  • Cache as much as possible, to minimize your impact on others (not to mention improve your performance).
  • Respect caching headers
  • Don’t fetch what you don’t need
  • Identify yourself. Don’t use a generic UA.
  • Try not to suck.

AOL and OpenID

So AOL uses OpenID. What’s pretty cool is that it adds 63 million OpenIDs thanks to AOL’s large user base (according to AOL). They also said:

We don’t yet accept OpenID identities within our products as a relying party, but we’re actively working on it. That roll-out is likely to be gradual.

OpenID is designed so that you can use provider to store your data, and authenticate to any OpenID enabled service using your own provider. The beauty of this is that unlike other unified login schemes, this one doesn’t form some sort of monopoly. I decided to take and see how far they’ve come. AOL’s rather long standing login page (which really hasn’t changed much since the AOL/Netscape authentication merge happened years ago) has finally been updated. The biggest change is the presence of prefs to allow you to choose what method of login you wish to use. I decided to try OpenID, and used mine. The results I guess aren’t so unexpected:

AOL OpenID

Interestingly, Netscape.com does support OpenID just fine.

OpenID is a really sweet system. Hopefully it will take off and do well. Hopefully there won’t be bias as to who accepts who as a provider.