<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Robert Accettura&#039;s Fun With Wordage &#187; amazon-s3</title>
	<atom:link href="http://robert.accettura.com/blog/tag/amazon-s3/feed/" rel="self" type="application/rss+xml" />
	<link>http://robert.accettura.com</link>
	<description>Robert Accettura&#039;s Personal Blog on Web Development and Tech</description>
	<lastBuildDate>Thu, 09 Feb 2012 01:43:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<atom:link rel='hub' href='http://robert.accettura.com/?pushpress=hub'/>
<cloud domain='robert.accettura.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Amazon S3 Outage</title>
		<link>http://robert.accettura.com/blog/2008/07/20/amazon-s3-outage/</link>
		<comments>http://robert.accettura.com/blog/2008/07/20/amazon-s3-outage/#comments</comments>
		<pubDate>Mon, 21 Jul 2008 02:13:55 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[amazon-s3]]></category>
		<category><![CDATA[cdn]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[mirror image]]></category>
		<category><![CDATA[reliability]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[uptime]]></category>
		<category><![CDATA[varnish]]></category>
		<category><![CDATA[wordpress.com]]></category>

		<guid isPermaLink="false">http://robert.accettura.com/?p=1845</guid>
		<description><![CDATA[The buzz around the web today was the outage of Amazon&#8217;s S3. It shows what websites are &#8220;doing it right&#8221;, and who fails. This is a great follow up to my &#8220;Reliability On The Grid&#8221; post the other day. Amazon &#8230; <a href="http://robert.accettura.com/blog/2008/07/20/amazon-s3-outage/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The buzz around the web today was the <a href="http://www.readwriteweb.com/archives/more_amazon_s3_downtime.php">outage of Amazon&#8217;s S3</a>.  It shows what websites are &#8220;doing it right&#8221;, and who fails.  This is a great follow up to my &#8220;<a href="http://robert.accettura.com/blog/2008/07/13/reliability-on-the-grid/">Reliability On The Grid</a>&#8221; post the other day.</p>
<p>Amazon S3 is cloud based computing.  Essentially when you send them a file using their REST or SOAP interface Amazon stores it on multiple nodes in their infrastructure.  This provides redundancy and security (in case a data center catches fire for example).  Because of this design it&#8217;s often though that cloud based computing is invincible to problems.  This is hardly the fact.  Just like any large system, it&#8217;s complicated and full of hazards.  It takes only a small software glitch, or an unaccounted for issue to cause the entire thing to grind to a halt.  More complexity = more things that can fail.</p>
<p>Amazon S3 is popular because it&#8217;s cheap and easy to scale.  It&#8217;s pay-per-use based on bandwidth, disk storage, and requests.  Because that allows for websites to grow without having to make a large infrastructure investment, it&#8217;s popular for &#8220;Web 2.0&#8243; companies trying to keep their budgets tight.  Notably sites like Twitter, WordPress.com, SmugMug and Amazon.com themselves all use Amazon S3 to host things like images.</p>
<p>Many sites, notably Twitter, and SmugMug didn&#8217;t have a good day today.  WordPress.com and Amazon.com operated like normal.  The obvious reason for this is WordPress.com and Amazon.com are much better in terms of infrastructure and design.</p>
<p>WordPress.com uses S3, but proxies that with <a href="http://varnish.projects.linpro.no/">Varnish</a>.  There&#8217;s a <a href="http://ma.tt/2007/10/s3-news/">brief description here</a>, and a <a href="http://blog.apokalyptik.com/2007/10/10/so-you-wanna-see-an-image/">more detailed breakdown here</a>.  According to <a href="http://barry.wordpress.com/2008/02/15/amazon-aws-outage/">Barry Abrahamson</a>, WordPress.com does 1500 image requests per second across and 80-100 are served through S3.  They have (slower) back up&#8217;s in house for when S3 is down and can failover if S3 has a problem.  This means they can leverage S3 to their advantage, but aren&#8217;t down because of S3.  Using Varnish allows them to keep the S3 bill down by using their own bandwidth (likely cheaper since they are a large site and can get better rates on bandwidth).  This  also and lets them have this have a good level of redundancy.  Awesome job.</p>
<p>Amazon.com uses S3 themselves.  If you look at images on the site, they are actually served from <code>g-ecx.images-amazon.com</code>.  Which is actually:</p>
<pre>
g-ecx.images-amazon.com. 38     IN      CNAME   ant.mii.instacontent.net.
</pre>
<p><code>instacontent.net</code> is actually part of <a href="http://www.mirror-image.com">Mirror Image</a>, a CDN.  This is essentially outsourcing what WordPress.com is doing in terms of caching.  It&#8217;s similar to Akamai&#8217;s services.  A CDN&#8217;s biggest advantage is lowering latency by using servers closer to the customer, which are generally going to feel faster.  The other benefit is that they cache content for when the origin is having problems.  Because Amazon has a layer on top of S3, they have an added level of protection and remained up and images loaded.</p>
<p>Twitter serves most images such as avatars right off of S3.  This means when S3 went down, there were thousands of dead images on their pages.  No caching, not even a <code>CNAME</code> in place.  Image hosting is the least of their concerns.  Keeping the service up and running is their #1 concern right now.  The service was still usable, just ugly.  Many users take advantage of third party clients anyway.</p>
<p>Using a CDN or having the infrastructure in house is obviously more expensive (it makes S3 more of a luxury than a cost savings measure), but it means your not depending on one third party for your uptime.
<div id="rja_commentCountImage"><a href="http://robert.accettura.com/?p=1845#comments"><img src="http://robert.accettura.com/wp-content/commentCount/2008/07/59bcda7.gif" alt="Comment Count" style="border:0;" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://robert.accettura.com/blog/2008/07/20/amazon-s3-outage/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Reliability On The Grid</title>
		<link>http://robert.accettura.com/blog/2008/07/13/reliability-on-the-grid/</link>
		<comments>http://robert.accettura.com/blog/2008/07/13/reliability-on-the-grid/#comments</comments>
		<pubDate>Mon, 14 Jul 2008 00:45:46 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Internet]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Web Development]]></category>
		<category><![CDATA[amazon]]></category>
		<category><![CDATA[amazon-s3]]></category>
		<category><![CDATA[data portability]]></category>
		<category><![CDATA[datacenter]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[reliability]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[twitter]]></category>
		<category><![CDATA[uptime]]></category>

		<guid isPermaLink="false">http://robert.accettura.com/?p=1815</guid>
		<description><![CDATA[There’s been a lot of discussion lately (in particular NYTimes, Data Center Knowledge) regarding both reliability of web applications which users are becoming more and more reliant on, as well as the security of such applications. It’s a pretty interesting &#8230; <a href="http://robert.accettura.com/blog/2008/07/13/reliability-on-the-grid/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>There’s been a lot of discussion lately (in particular <a href="http://www.nytimes.com/2008/07/06/technology/06outage.html?_r=3&#038;partner=rss&#038;oref=slogin&#038;oref=slogin">NYTimes</a>, <a href="http://www.datacenterknowledge.com/archives/2008/Jul/06/is_downtime_more_frequent_or_more_visible.html">Data Center Knowledge</a>) regarding both reliability of web applications which users are becoming more and more reliant on, as well as the security of such applications.  It’s a pretty interesting topic considering there are so many things that ultimately have an impact on these two metrics.  I call them metrics since that’s what they really are.</p>
<p><span id="more-1815"></span></p>
<h4>Defining uptime, security, and privacy</h4>
<p>For the intents of the discussion at hand, &#8220;uptime&#8221; is defined as the application being accessible and functional to the user.  Note putting a &#8220;fail whale&#8221;? image up so that the page loads doesn’t not qualify as functional.  For all intents and purposes the service is down.  One should also note that traffic goes through different routes to get to different users, hence a site can be down for one person, but up for millions of others.  The vast majority (95%+) should be able to use the service for it to really be considered &#8220;up&#8221;.</p>
<p>&#8220;Security&#8221; defined as the assurance that privacy, data integrity, and account access are restricted in accordance with typical site functionality and users understanding.  &#8220;Privacy&#8221; is defined as not allowing any unauthorized person or entity to manipulate, view, copy, handle, destroy, or know about the existence of data without explicit approval from the user.   </p>
<h4>Why applications fail</h4>
<p>Applications can fail for many reasons, but most can be lumped into a handful of categories.  At the highest level you have in-house and upstream reasons.  In-house can be defined as something you can control, for example software or servers you control, while upstream is typically a vendor or partner, for example ISP, colo facility, etc, which there is less control over (other than submitting a ticket).  Generally startups have more upstream services and bring more things in-house as time goes on.  For example, Facebook relies on colo facilities for their servers.  They now plan to build their own (more control, and hopefully will ensure lower costs as well).</p>
<p>On a lower level you break things down to hardware and software.  Hardware failures are inevitable.  Computers suck in a 24&#215;7 environment.  We deal with that since they are better than people, who still insist on sleep (lazy bastards).  Hard drives fry, motherboards fail, fans die resulting in &#8220;thermal events&#8221;?.  Generally it’s pretty easy to deal with this.  You can use RAID so 1 hard drive isn’t critical after all, moving parts are the most prone to failure.  You can also have more than one server powering a successful application.  If one dies, the load goes to other boxes running on the grid.  You can put them in different data centers so if there was a problem at one, your still up and running.  This obviously comes at a cost.  Services like Google App Engine, Amazon’s S3 and Amazon’s EC2 help lower the cost, but also result in hardware being handled by an upstream provider.  Amazon and Google are very redundant, but they too can and have failed.  </p>
<p>Software generally fails because it either wasn’t designed to scale, or it was hatefully put together to meet a deadline.  Startups are infamous for this as the business guys just want things done quick and cheap and don’t care about reliability until it’s too late (they will also deny this until the end of time).  All major software platforms can scale when done correctly.  Many people say Perl can’t scale, but it has for a decade, look at IMDB, Amazon and Slashdot among the many.  Even more claim PHP can’t scale, but Facebook and Yahoo seem to run fine.  Python (YouTube), Ruby-On-Rails (YellowPages.com, Hulu, 43things)  ASP.NET (MySpace and Microsoft) all seem workable in high traffic situations. It’s not what you use, but how you use it.  These run on Apache, IIS, Oracle, MySQL, among others.  The platform is rarely (if ever) the problem.  The implementation almost always is.</p>
<p>There’s also the possibility that everything is fine and dandy, but somewhere along the internet from the servers to some of your users there’s a problem.  ISP’s encounter tons of problems with people snagging their fiber and tearing a line, to DoS attacks and viruses reeking havoc.  When this happens close to the user, no sites are accessible, when this happens further away several sites may be inaccessible or slow.  Users often wrongly attribute this to a site or application being slow or down when that’s hardly the case.  Using a data center with good connectivity reduces these cases.  Having data centers distributed around the globe is even better, but often not economical.  The best a business can do is submit a ticket and wait.  If it’s frequent enough they can move somewhere else.</p>
<h4>Why security fails</h4>
<p>Security failure is almost too complex of a topic to discuss without holding a complete college course.  The most obvious answer is that someone is cleverer than the person in charge of security, and outwitted or outsmarted them.  It could be in the physical form (stealing a server or hard drive with data), or in the electronic form (Phishing, XSS, DoS).  It could be a &#8220;hacker&#8221;, or it could be an application failure that results in a security glitch.</p>
<p>Many websites take several measures to protect your privacy.  They require &#8220;strong&#8221; passwords, maybe even require you to change them.  For things like banking you may have &#8220;security questions&#8221; to answer.  Perhaps even a key fob to provide two factor authentication.  </p>
<p>Most security failure can be traced to stupidity.  For example using &#8220;password&#8221; for your password, or replying to an email asking for your password.  A poorly configured server can also be a vulnerability.  Then all you need is someone who wants to exploit that.  If the data is of any value, that person exists.  </p>
<h4>When businesses fail</h4>
<p>Hackers want your data, business want to keep it secure, but don’t want to spend too much time/effort on it since the formula is <code>time = money</code>.  There’s really not much more to explain here.  </p>
<p>Sometimes it’s not even the business you know your dealing with.  You may be working with company X, but they may use company Y, Z, A1, A2, A3, and A4 to actually provide their services.  Your data may be accessible by any or all of them.  </p>
<p>Then there’s the possibility of a business going out of business.  They may give you a chance to download your data and move it elsewhere or they may even do it for you.  They may also just shut down abruptly and disappear of the face of the earth.  Goodbye data.</p>
<h4>Take control of your data</h4>
<p>I may sound cynical for effectively saying applications fail, many people could potentially see your data, and there’s nothing you can do about it.  I&#8217;m not, I am a realist, and I know what goes on behind the scenes.  There actually is something you can do about it:  <em>Take control of your data.  Keep control of your data</em>.</p>
<ol>
<li>Know who has your data, what they might do to it, who they might share it with, and what they will do to protect it.  Companies (at least reputable ones) post privacy policies for a reason.  Check them out or Google for some info on that company.  The results may surprise you.  For example if you delete something from Google Docs it may take 3 weeks for it to actually be deleted on their servers.  This isn’t uncommon, but many people assume once you delete it, the company deletes it.  That’s not the case.</li>
<li>Think about accessibility.  What happens if that application has an outage?  What happens if your ISP has a problem?  Or your cable line got cut?  Using an online office suite is a great way to keep documents accessible from home or work, but not so great when you can’t access them.  Storing them on a USB drive may prove useful, at least as a backup.  If you&#8217;ve got a business, this is especially true.  You may also want to consider a 2nd way to get online should your ISP have problems (giving a wireless card and a laptop to certain employees may also have the perk of  allowing employees to be more mobile).</li>
<li>Decide the fate of your data.  I personally prefer to keep a copy of everything so if a company goes under, I still have my data.  I host my own blog, and my own photos.  I keep backups of all that too, in multiple locations.  I know I’ll be around as long as I care about keeping that data online.  I’m not going out of business.  If I am, I don’t care about that data anymore <img src='http://robert.accettura.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  .  I always have my data.  You should too.</li>
</ol>
<h4>Keep control of your data</h4>
<p>Just because you’ve figured out how to protect your data, doesn’t mean you’re done.  You need to reevaluate yourself every time you start using something else, or change your usage patterns.  You don’t have to keep your data offline, just understand what putting it online really means.  Offline backups aren&#8217;t a bad idea.  Having backups on another service is also an option, but may be even more complicated.</p>
<p>This is somewhat more complicated in the case of things like social networks, but things like <a href="http://www.dataportability.org/">Data Portability</a> are slowly becoming a reality.</p>
<p>Google in general has been pretty good with leaving the options to take your data back.  Gmail lets you use IMAP to download all your mail, Google Reader lets you export an OPML feed, Google Docs lets you save all your docs to your computer.  It&#8217;s important to know what the services you rely on let you do with your data.  Don&#8217;t just assume you can easily get it out.  </p>
<h4>You’re responsible for your fate</h4>
<p>It’s easy to blame Google, Microsoft, Yahoo, or Twitter for your problems, but that’s really a poor excuse.  You’re responsible for the choices you make, and what you rely on.  If what you’re relying on isn’t giving you what you need, you need to find something else, or reevaluate if your putting your priorities in the right place.</p>
<p>I now present to you&#8230;</p>
<h4>Accettura&#8217;s Law Of Business Computing</h4>
<pre>
where
people = prone to frequent failures
technology = expensive, complex, frequent failure

business computing = people + technology = complex frequent failures that are costly in nature.
</pre>
<p>You can see how this works right?  Best way to avoid that cost?  Make sure your technology is redundant, and your people&#8217;s interaction is controlled to prevent failure from leaking into the technology.</p>
<p>This should be in Wikipedia and every Business and CompSci textbook.  That way everything that a student touches or thinks about in this industry is done with this in mind.  Build with the knowledge in mind that the <a href="http://www.flickr.com/photos/scriptingnews/2537265280/">fail whale</a> will just make you a relic before you even hit your prime.</p>
<p>That said, <em>get over Twitter being down and stop complaining.</em>
<div id="rja_commentCountImage"><a href="http://robert.accettura.com/?p=1815#comments"><img src="http://robert.accettura.com/wp-content/commentCount/2008/07/f0bbac6.gif" alt="Comment Count" style="border:0;" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://robert.accettura.com/blog/2008/07/13/reliability-on-the-grid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drobo for network storage?</title>
		<link>http://robert.accettura.com/blog/2007/06/07/drobo-for-network-storage/</link>
		<comments>http://robert.accettura.com/blog/2007/06/07/drobo-for-network-storage/#comments</comments>
		<pubDate>Fri, 08 Jun 2007 01:35:11 +0000</pubDate>
		<dc:creator>Robert</dc:creator>
				<category><![CDATA[Hardware]]></category>
		<category><![CDATA[Networking]]></category>
		<category><![CDATA[amazon-s3]]></category>
		<category><![CDATA[Drobo]]></category>
		<category><![CDATA[Network]]></category>
		<category><![CDATA[RAID]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://robert.accettura.com/archives/2007/06/07/drobo-for-network-storage/</guid>
		<description><![CDATA[Drobo initially didn&#8217;t impress me to much, but after watching a demo I&#8217;m somewhat impressed. The positives: The hotswapping, RAID-like (but not RAID) redundancy is awesome. That&#8217;s perfect for backup/bulk storage purposes. Transfer isn&#8217;t bad (Up to read 22MB/s write &#8230; <a href="http://robert.accettura.com/blog/2007/06/07/drobo-for-network-storage/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.drobo.com">Drobo</a> initially didn&#8217;t impress me to much, but after watching <a href="http://scobleizer.com/2007/06/07/drobo-storage-device-demoed/">a demo</a> I&#8217;m somewhat impressed.  The positives:</p>
<ul>
<li>The hotswapping, RAID-like (but not RAID) redundancy is awesome.  That&#8217;s perfect for backup/bulk storage purposes.</li>
<li>Transfer isn&#8217;t bad (Up to read 22MB/s write 20MB/s)</li>
<li>Power consumption idles at about <a href="http://www.drobo.com/products_drobo_specifications.aspx#products_nav">12 watts</a> which isn&#8217;t bad.</li>
<li>Adding storage capacity is really easy.</li>
</ul>
<p>There are some downsides:</p>
<ul>
<li>No Linux support.  Which stinks if you were to hook it up to an old PC running Linux and use Samba.  You could of course use a Mac.</li>
<li>Pretty expensive $499 isn&#8217;t cheap for a glorified drive enclosure.  You still need a host, and drives.</li>
</ul>
<p>Of course for true backup you need to offsite your data, but you can do that through standard means, and using Amazon&#8217;s S3.  So your covered there.</p>
<p>The downfall of this product is the lack of a 10/100 Ethernet port.  It would likely have been pretty cheap (lets face it network devices are pretty cheap these days) and would have removed the need for a PC.  You could of course hook it up to a Access Point such as the <a href="http://robert.accettura.com/blog/2007/02/03/airport-extremes-shortcomings/">Airport Extreme</a>&#8230; but you don&#8217;t get the greatest level of control with these.  </p>
<p>Ideally a real cheapo Linux machine (Intel Celeron, 1GB RAM, 80GB HD) with a Drobo would be an awesome backup solution.  You could then use MRTG to graph network/data storage usage, manage usage, quota&#8217;s or whatever else you wanted to do.  Even a media server.  Backup some data with S3?  No problem.  Could even setup something like <a href="http://backuppc.sourceforge.net">BackupPC</a> to backup entire PC&#8217;s.
<div id="rja_commentCountImage"><a href="http://robert.accettura.com/archives/2007/06/07/drobo-for-network-storage/#comments"><img src="http://robert.accettura.com/wp-content/commentCount/2007/06/7895fc1.gif" alt="Comment Count" style="border:0;" /></a></div>
]]></content:encoded>
			<wfw:commentRss>http://robert.accettura.com/blog/2007/06/07/drobo-for-network-storage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

