Engineering Efficiency

Internet companies have the unique ability to scale quicker than any other industry on earth. Never before has a company been able to position itself from being nothing more than an idea to being in the living rooms of millions around the globe in a matter of hours. While this introduces seemingly unlimited opportunities to grow it also allows for exponential waste if a company isn’t careful. It’s interesting to see how they scale. Scaling businesses in many ways isn’t very different than scaling servers and software stacks.

The Classic Example: UPS

Started in 1907 and adopting the name United Parcel Service in 1919 UPS has no real “high tech” background unless you include the Ford Model T. That doesn’t mean it couldn’t become more efficient. UPS has made a science of the delivery business. For example it’s famous for it’s “no left” policy. Simply put they found that avoiding left turns means less time waiting at lights which means less fuel is wasted. The more efficient routing formerly done by humans now computerized saves them 3 million gallons of fuel in 2007 alone. Lets do the math:

Assuming they run 100% diesel at an average cost of $2.87/gallon in 2007 [doe] multiplied by 3 million that’s $8.61 million dollars by trying to avoid left turns.

Not bad for a souped up mapping application.

By having their drivers do things like turning of the ignition while unbuckling their seat belt at the same time, and scanning for the doorbell while walking towards the door (it’s easier to see from a distance than up close) they can shave time off of their routes.

Then of course there’s package tracking. While customers might like to know in what city their weight loss taps are sitting tracking systems help reduce loss and monitor package routing for optimal efficiency.

Cutting Utility Bills: Google

Being the largest search engine, a large ad network, email provider, analytics firm, mapping service, video site, and whatever else they do means Google needs a ton of servers. Cramming servers into data centers and keeping them cool to prevent hardware failures is a complicated and expensive task. Keeping the whole thing powered is also really expensive. Google has scrutinized server designs to eliminate all waste possible. This has resulted in Google having more horsepower at a lower cost than their competitors. Having more capacity at a lower cost means Google can do more at a lower cost than their competitors. I won’t discuss Google in too much detail since they did a great job themselves recently and I mentioned it the other day in another blog post: Google’s Data Center Secrets.

Shipping Efficiency: Amazon

Amazon’s long been improving efficiency by using data collection and analysis to encouraging their customers to spend more. Their algorithms to recommend related products you might be interested in is one of the best out there. Their ordering system is streamlined to prevent customers from bailing before completion. Their products are SEO’d to appear on the top of Google searches. That doesn’t mean Amazon can’t improve other parts of their business.

Amazon several months ago started a Frustration-Free Packaging program. Here’s how they describe it:

The Frustration-Free Package (on the left) is recyclable and comes without excess packaging materials such as hard plastic clamshell casings, plastic bindings, and wire ties. It’s designed to be opened without the use of a box cutter or knife and will protect your product just as well as traditional packaging (on the right). Products with Frustration-Free Packaging can frequently be shipped in their own boxes, without an additional shipping box.

The key here is “can frequently be shipped in their own boxes”. By shipping a box alone rather than packaging they can skip a step in their warehouses (and the packaging materials that go with packaging something for delivery). This also lowers the weight as those extra boxes don’t weigh 0 oz. The frustration free packaging is also the perfect shape for efficiently filling trucks and strong enough to not crush easily thus lowering returns due to damage.

Amazon now even has a feedback form [login required] for users to share what they think of their package. This has the added bonus of helping further reduce the inefficient shipping practices so common right now.

Amazon’s also done a significant amount of work on their infrastructure to make their servers scale well using tech such as EC2 and S3. By selling capacity to other companies they able to take advantage of economy of scale as well as diversify their business beyond just retail. Of course they are planning their data centers to have access to cheap power.

These aren’t haphazard attempts at increasing efficiency, these are well calculated engineered approaches to removing even the smallest inefficiencies with the knowledge of how they compound as operations scale. Aren’t they clever?

Site Backups And Bandwidth Fun

I keep regular backups of everything on this server just in case something happens. Recently I switched to a more automated and secure (PGP encrypted) solution for this blog due to it’s fast-paced nature. Just the critical stuff (database, media, templates). I choose PGP (implemented using GPG) since it’s easy, and I only have to store the public key on the server, making it safer than most alternatives.

I’m strongly considering moving it all eventually over to Amazon’s S3 storage. At $0.15 per GB-Month of storage used and $0.20 per GB of data transferred it would be very affordable to keep backups in an even more secure fashion. I’d still use my own encryption on top of theirs for extra security. For things like media, I could even see myself hosting it solely at Amazon. It just seems like that may be a more practical and scalable approach.

Unfortunately until either FTTH or DOCSIS 3.0 comes to town, it doesn’t look like Amazon’s S3 will be practical for home backup purposes. This server has a beefy connection to a few large pipes to the internet (Level3, Global Crossing, and Cogent last I checked). They provides high speed connectivity so a backup would take only a few seconds. At home with a cable modem on a DOCSIS 1.1 network (such as Comcast) the bandwidth is just to slim to allow enough upload capacity. Comcast still only allows 384kbps up. Even the top plans in select areas don’t top 1Mbps. Of course these are Comcast’s numbers (the actual performance is often less). In areas that they currently serve, Verizon FiOS (FTTH) is available at 15 Mbps/2 Mbps. Much better suited for such purposes (though more would be welcome). As strange as it may seem pricing is quite competitive, giving cable a run for it’s money. Perhaps one day DOCSIS 3.0 will appear, though that seems to be a while away. Perhaps one day all homes will have 100Mbps full duplex connections with low latency.

The only real way to get around this limitation is to perhaps use rsync to perform backups. Initial backups would still suck, but after that it wouldn’t be too bad. Though that wouldn’t work with services such as Amazon’s S3, which are token based. There is an rsync-like clone, but it’s still not the real thing. Perhaps Google’s upcoming GDrive will be cool enough to allow the use of rsync over SSH (I could dream) in addition to WebDAV (which is what I expect to see). Last I checked rsync doesn’t support WebDAV because WebDAV is done over HTTP. If I understand it right, RFC 3229 would add Delta encoding support to HTTP, making something like rsync over WebDAV possible since it uses delta encoding.