Apple Data Centers To Be Green By 2013

Just the other day Microsoft announced it was going carbon neutral. Apple is now goaling for 100% renewable energy for it’s data centers by 2013. This is a very different goal than Microsoft, but still quite interesting.

Apple is a much more focused company than Microsoft. I could be wrong, but I doubt they are dogfooding their future product like Microsoft likely is. My guess is Tim Cook is looking at the financials today, and where Apple wants to be in 2013. He’s a supply chain guy. When it comes to IT operations (the cloud). Electricity is a huge part of that supply chain. Renewable energy has a high upfront cost, but it’s very predictable. The sun doesn’t increase in cost depending on politics or hurricanes, nor does wind. If Apple is going to become the cloud provider for it’s growing tablet and phone market, it’s going to need to scale it’s cloud even further. That means controlling prices from it’s suppliers. Energy included. Apple can afford the high upfront costs of renewable energy. It can benefit from the longer term predictability and eventual drop in costs to scale this.

That is why I think they are doing this.

Data Center Power Consumption

It’s hardly a secret that there is a serious demand for saving power in data centers. In a recent Times Magazine article:

Data centers worldwide now consume more energy annually than Sweden. And the amount of energy required is growing, says Jonathan Koomey, a scientist at Lawrence Berkeley National Laboratory. From 2000 to 2005, the aggregate electricity use by data centers doubled. The cloud, he calculates, consumes 1 to 2 percent of the world’s electricity.

To put that in a little more perspective, the 2009 census for Sweden puts the population at 9,263,872. Sweden’s population is just slightly higher than New York City (8,274,527 in 2007) or the state of New Jersey (8,682,661 estimate in 2008). Granted Sweden’s population density is 20.6/km2 compared to New York City’s 10,482/km2 or New Jersey’s 438/km2. Population density is important since that says a lot about energy consumption. Dense populations require less energy thanks to communal resources. I still suspect the average Swede uses less electricity than the average American anyway. All these numbers were pulled from Wikipedia.

The US Department of Energy does have data on power consumption and capacity as well as forecasts on consumption and production. The obvious downside in the data is the reliance on coal, oil and gas which have environmental impacts as well as political impacts and costs (we know about the instabilities of the oil market). This is why companies with lots of servers like Google are looking very carefully at power generation alternatives such as hydroelectric and solar.

We all benefit from data center efficiency. Lower cost computing is a big advantage to startups and encourages more innovation by removing price barriers. It’s also an advantage to the general public since the technology and tricks learned eventually trickle down to consumers. We already are seeing more efficient power supplies, some even beating the original 80 PLUS certification.

Perhaps if we started tracking “performance per watt” in addition to “watts per square foot” we’d be looking at things from a more sustainable perspective.

Data center capacity and consumption is pretty interesting when you look at all the variables involved. Growth, power costs, facility size, technology available, even foreign politics play a role in what it costs to operate.

Engineering Efficiency

Internet companies have the unique ability to scale quicker than any other industry on earth. Never before has a company been able to position itself from being nothing more than an idea to being in the living rooms of millions around the globe in a matter of hours. While this introduces seemingly unlimited opportunities to grow it also allows for exponential waste if a company isn’t careful. It’s interesting to see how they scale. Scaling businesses in many ways isn’t very different than scaling servers and software stacks.

The Classic Example: UPS

Started in 1907 and adopting the name United Parcel Service in 1919 UPS has no real “high tech” background unless you include the Ford Model T. That doesn’t mean it couldn’t become more efficient. UPS has made a science of the delivery business. For example it’s famous for it’s “no left” policy. Simply put they found that avoiding left turns means less time waiting at lights which means less fuel is wasted. The more efficient routing formerly done by humans now computerized saves them 3 million gallons of fuel in 2007 alone. Lets do the math:

Assuming they run 100% diesel at an average cost of $2.87/gallon in 2007 [doe] multiplied by 3 million that’s $8.61 million dollars by trying to avoid left turns.

Not bad for a souped up mapping application.

By having their drivers do things like turning of the ignition while unbuckling their seat belt at the same time, and scanning for the doorbell while walking towards the door (it’s easier to see from a distance than up close) they can shave time off of their routes.

Then of course there’s package tracking. While customers might like to know in what city their weight loss taps are sitting tracking systems help reduce loss and monitor package routing for optimal efficiency.

Cutting Utility Bills: Google

Being the largest search engine, a large ad network, email provider, analytics firm, mapping service, video site, and whatever else they do means Google needs a ton of servers. Cramming servers into data centers and keeping them cool to prevent hardware failures is a complicated and expensive task. Keeping the whole thing powered is also really expensive. Google has scrutinized server designs to eliminate all waste possible. This has resulted in Google having more horsepower at a lower cost than their competitors. Having more capacity at a lower cost means Google can do more at a lower cost than their competitors. I won’t discuss Google in too much detail since they did a great job themselves recently and I mentioned it the other day in another blog post: Google’s Data Center Secrets.

Shipping Efficiency: Amazon

Amazon’s long been improving efficiency by using data collection and analysis to encouraging their customers to spend more. Their algorithms to recommend related products you might be interested in is one of the best out there. Their ordering system is streamlined to prevent customers from bailing before completion. Their products are SEO’d to appear on the top of Google searches. That doesn’t mean Amazon can’t improve other parts of their business.

Amazon several months ago started a Frustration-Free Packaging program. Here’s how they describe it:

The Frustration-Free Package (on the left) is recyclable and comes without excess packaging materials such as hard plastic clamshell casings, plastic bindings, and wire ties. It’s designed to be opened without the use of a box cutter or knife and will protect your product just as well as traditional packaging (on the right). Products with Frustration-Free Packaging can frequently be shipped in their own boxes, without an additional shipping box.

The key here is “can frequently be shipped in their own boxes”. By shipping a box alone rather than packaging they can skip a step in their warehouses (and the packaging materials that go with packaging something for delivery). This also lowers the weight as those extra boxes don’t weigh 0 oz. The frustration free packaging is also the perfect shape for efficiently filling trucks and strong enough to not crush easily thus lowering returns due to damage.

Amazon now even has a feedback form [login required] for users to share what they think of their package. This has the added bonus of helping further reduce the inefficient shipping practices so common right now.

Amazon’s also done a significant amount of work on their infrastructure to make their servers scale well using tech such as EC2 and S3. By selling capacity to other companies they able to take advantage of economy of scale as well as diversify their business beyond just retail. Of course they are planning their data centers to have access to cheap power.

These aren’t haphazard attempts at increasing efficiency, these are well calculated engineered approaches to removing even the smallest inefficiencies with the knowledge of how they compound as operations scale. Aren’t they clever?

Rackspace Acquisitions

Despite the bad economy, Rackspace is acquiring startup JungleDisk and SliceHost. This is a very interesting step on their behalf.

Buying JungleDisk makes sense since Rackspace wants to get into the cloud storage business. JungleDisk is one of the bigger Amazon S3 products out there. By adding Rackspace support to the product they can quickly attempt to get into that market. If they will succeed depends on their offering’s cost. Their press release suggests $0.15/GB, but that doesn’t say if they will bill based on requests and bandwidth (which is where Amazon S3 gets expensive). Also interesting is this little nugget:

Also later this year, Limelight Networks will team with Rackspace to allow developers to easily distribute content to millions of end users around the world and bring scalable content delivery and application acceleration services to the masses.

This is competing with Amazon’s attempts at starting a CDN later this year. It’s worth noting that these are both pretty primitive CDN’s since they require you to register objects before the CDN hosts them. Modern CDN’s like Limelight and Akamai allow you setup a CNAME so that their CDN essentially acts as a middle layer between your origin servers and your users. This requires no preregistering since the CDN can just check the origin for any asset requested. Caching is configured via configuration files and via standard http headers. I’m not sure how useful these CDN’s will be to most. Registering objects and uploading to another platform is a giant pain as opposed to just setting up a transparent CNAME. The difference is one requires development time, the other doesn’t.

Acquiring Slicehost makes sense since they apparently have technology that will be useful to Rackspace. They are making a bet that startups in need of hosting on virtual machines (which is much more complicated to manage than typical shared hosting) will produce a decent market in the future. With the economic downturn, at least in the short term this may not look like the most useful purchase. In the long run this may pay off handsomely. They have decent competition in that space and it’s quickly growing. Rackspace’s size may help it weather a downturn better than others though.

They closed 5.18 +0.22 (4.44%) today, despite the DOW being -514.45, so I guess I’m not alone in my assessment.

Amazon S3 Outage

The buzz around the web today was the outage of Amazon’s S3. It shows what websites are “doing it right”, and who fails. This is a great follow up to my “Reliability On The Grid” post the other day.

Amazon S3 is cloud based computing. Essentially when you send them a file using their REST or SOAP interface Amazon stores it on multiple nodes in their infrastructure. This provides redundancy and security (in case a data center catches fire for example). Because of this design it’s often though that cloud based computing is invincible to problems. This is hardly the fact. Just like any large system, it’s complicated and full of hazards. It takes only a small software glitch, or an unaccounted for issue to cause the entire thing to grind to a halt. More complexity = more things that can fail.

Amazon S3 is popular because it’s cheap and easy to scale. It’s pay-per-use based on bandwidth, disk storage, and requests. Because that allows for websites to grow without having to make a large infrastructure investment, it’s popular for “Web 2.0” companies trying to keep their budgets tight. Notably sites like Twitter, WordPress.com, SmugMug and Amazon.com themselves all use Amazon S3 to host things like images.

Many sites, notably Twitter, and SmugMug didn’t have a good day today. WordPress.com and Amazon.com operated like normal. The obvious reason for this is WordPress.com and Amazon.com are much better in terms of infrastructure and design.

WordPress.com uses S3, but proxies that with Varnish. There’s a brief description here, and a more detailed breakdown here. According to Barry Abrahamson, WordPress.com does 1500 image requests per second across and 80-100 are served through S3. They have (slower) back up’s in house for when S3 is down and can failover if S3 has a problem. This means they can leverage S3 to their advantage, but aren’t down because of S3. Using Varnish allows them to keep the S3 bill down by using their own bandwidth (likely cheaper since they are a large site and can get better rates on bandwidth). This also and lets them have this have a good level of redundancy. Awesome job.

Amazon.com uses S3 themselves. If you look at images on the site, they are actually served from g-ecx.images-amazon.com. Which is actually:

g-ecx.images-amazon.com. 38     IN      CNAME   ant.mii.instacontent.net.

instacontent.net is actually part of Mirror Image, a CDN. This is essentially outsourcing what WordPress.com is doing in terms of caching. It’s similar to Akamai’s services. A CDN’s biggest advantage is lowering latency by using servers closer to the customer, which are generally going to feel faster. The other benefit is that they cache content for when the origin is having problems. Because Amazon has a layer on top of S3, they have an added level of protection and remained up and images loaded.

Twitter serves most images such as avatars right off of S3. This means when S3 went down, there were thousands of dead images on their pages. No caching, not even a CNAME in place. Image hosting is the least of their concerns. Keeping the service up and running is their #1 concern right now. The service was still usable, just ugly. Many users take advantage of third party clients anyway.

Using a CDN or having the infrastructure in house is obviously more expensive (it makes S3 more of a luxury than a cost savings measure), but it means your not depending on one third party for your uptime.

Reliability On The Grid

There’s been a lot of discussion lately (in particular NYTimes, Data Center Knowledge) regarding both reliability of web applications which users are becoming more and more reliant on, as well as the security of such applications. It’s a pretty interesting topic considering there are so many things that ultimately have an impact on these two metrics. I call them metrics since that’s what they really are.

Continue reading

FoxTorrent

I said back in 2004 that Firefox needs built in support for BitTorrent. My idea was it would be integrated into the download manager so that it was “just another protocol” and would be transparent to a typical user. I still stand by that.

Fast forward to 2007: FoxTorrent is by RedSwoosh (now owned by Akamai).

I’d personally love to see something like this ship built in. It’s a great feature. BitTorrent is a great protocol for distributing large downloads without having to buy expensive infrastructure. Akamai’s interest is proof of that.

FoxTorrent has a blog if you want to keep an eye on it. FoxTorrent is MIT licensed as well. It seems like a very interesting product. I’ll have to dig into this and look at it a bit closer.

[Hat tip: TechCrunch]