Web App Stores Via Twittter/Facebook

It seems likely to me that Facebook and Twitter will eventually be competing with Apple in terms of App stores. Facebook sort of already is with their extensive apps platform, however that’s just competing for developer attention. Twitter doesn’t really have an equivalent today (developers mainly build clients and interact with data), but don’t underestimate their clout.

The reason I say this is that Facebook and Twitter have become identity gatekeepers on the net. Already you can login to many sites via accounts with one of the two sites. Creating the API’s to handle purchase/subscriptions and transparently handling the billing to effectively turning a HTML5 site into an “app” is the next logical step. They could undercut Apple and still walk away with a handsome profit for not doing terribly much more than leveraging their size and reach. These apps would work on any device with a web browser. Desktop or mobile.

Given both sites need to diversify revenue streams (something Google never figured out), it seems only logical to make this step. $0.99 for Angry Birds seems more than plausible.

And yes, there are offline abilities in a browser.

Who Indexes Tweets

I was curious who is indexing the links that people tweet on Twitter. It’s obvious someone does since links get ‘clicks’ almost immediately after submission. To do this presumably they are tapping into the xmpp firehose.

Lets take a look:

66.xxx.xxx.xxx - - [06/Dec/2009:20:17:43 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

I guess Google has a deal with Twitter. Googlebot indexed just a few seconds after it was sent. As far as I know nothing is actually announced. This is the first evidence I know of a potential deal of some sort. I’d be shocked if Google is scraping the site this quickly.

Edit: Stephen Duncan pointed out in the comments that this was announced in October. Totally forgot about that.

208.xxx.xxx.xxx - - [06/Dec/2009:20:17:47 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; Butterfly/1.0; +http://labs.topsy.com/butterfly.html) Gecko/2009032608 Firefox/3.0.8"

This is Topsy, a twitter search engine. Never saw this site before. Few tests and I actually kind of like the output.

89.xxx.xxx.xxx - - [06/Dec/2009:20:17:58 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; MSIE 6.0b; Windows NT 5.0) Gecko/2009011913 Firefox/3.0.6 TweetmemeBot"

Tweetmeme mines Twitter links and attempts to build a Digg-like index based on retweets rather than Diggs.

75.xxx.xxx.xxx - - [06/Dec/2009:20:18:05 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
72.xxx.xxx.xxx - - [06/Dec/2009:20:20:25 +0000] "GET /test HTTP/1.1" 301 - "-" "Python-urllib/2.5"

Can’t identify these AWS hosted services.

70.xxx.xxx.xxx - - [06/Dec/2009:20:20:53 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"
70.xxx.xxx.xxx - - [06/Dec/2009:20:24:23 +0000] "GET /test HTTP/1.1" 301 20 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)"

This is actually Microsoft. Microsoft’s Bing search engine indexes Twitter. I’m not sure why they indexed twice in such close intervals that seems odd for this day and age.

Mining logs a little deeper it looks like when tweets meet certain criteria (such as retweeted) there are other bots that spider them. It also looks like other search engines may be indexing at a slower rate (Baidu for example).

There are several others from AWS and a few other dedicated providers. These servers are obviously trying to keep a low profile, they don’t even have reverse DNS.

So there you go. Just a matter of seconds after a link hits Twitter this all happens.

Here’s a few more from another Tweet that weren’t in the first set:

Edit: More!:

75.xxx.xxx.xxx - - [06/Dec/2009:20:49:42 +0000] "GET /test HTTP/1.1" 301 - "-" "Mozilla/5.0 (compatible; Feedtrace-bot/0.2; bot@feedtrace.com)"

Feedtrace is some sort of twitter mining service currently in beta.

67.xxx.xxx.xxx - - [06/Dec/2009:20:49:45 +0000] "GET /test HTTP/1.0" 301 - "-" "Mozilla/5.0 (compatible; mxbot/1.0; +http://www.chainn.com/mxbot.html)"

Chainn is a social data mining service with a few apps that make use of the data it collects.

Capturing User Innovation

Building a new product is always fun. You draft ideas, generate wireframes, mockups, prototypes, you build your app, you tweak it, you release it. In the case of software and web applications you also get to update it and make it better. If it’s hardware, you work on a 2nd revision to be sold a year later to people who didn’t adopt early (jab at early adopters).

One of the most interesting things is how users actually use the product you make, if they use it at all. Do they use it a little or a lot? Do they use it as intended? Do they find things missing? To robust for their taste? Or do they just find uses and modifications that all the engineers involved never in a million years would have contemplated?

Continue reading

Loren Brichter On Tweetie

Loren Brichter is the author of the popular Twitter application Tweetie, an iPhone only application until the Mac version was released on Monday. MacWorld has a great little interview with Loren. One thing I really admire is that Loren really understands how to build a good application. Performance, ease of use, simplicity are all taken into account. Not just features and toys.

I thought this particular nugget was the highlight though:

..AIR apps are like modern day Java applets… sure, they run on every platform. But they also suck on every platform.

I’ve yet to find an Adobe AIR application I like even though several have great ideas behind them. Even on Windows, where I presume AIR has the biggest market share they all look strange, the UI is garbage and the performance is abysmal. On the Mac it gets even worse. Creating a Mac theme won’t help as my expectations for a Mac UI are different than they are on Windows or Linux. Java apps have the same issues.

I think this is why more and more “applications” are becoming web based. If your going to feel awkward and unnatural to the user anyway, why even bother with the installation barrier? Why not just be web based so you don’t have to download and install. As awkward as they may be, those that add Adobe Flash tend to make the problem worse by adding more strange feeling UI to their application. Adobe Flash does do good video, it’s a big reason YouTube became popular, but it’s really no replacement for user interface. Hopefully in 2017 when HTML5 is wrapping up we’ll have this problem solved.

User Generated Content Ownership

Since the creation of the <form/> elements people have been wondering about the ownership and copyright of content created online. From email and message boards in Web 1.0 to blogs and Twitter in Web 2.0 the same fundamental questions remain.

Lately, Twitter has been the focus. Twitter is actually pretty clear about it’s claims to user generated content:

  1. We claim no intellectual property rights over the material you provide to the Twitter service. Your profile and materials uploaded remain yours. You can remove your profile at any time by deleting your account. This will also remove any text and images you have stored in the system.
  2. We encourage users to contribute their creations to the public domain or consider progressive licensing terms.

It’s pretty clear that Twitter is taking a hands off approach, but it doesn’t let users decide what they want. I’m personally a fan of Creative Commons so my suggestion would be to let decide in their account settings how they wish to license and choose between CC licenses. That of course makes retweeting complicated to put it nicely (it’s more like a minefield). That’s likely the reason they avoid the licensing issue. Sure you can put some sort of an icon next to the tweet to indicate the licensing, but what if someone retweets it? Or modifies it ever so slightly? Is it a new tweet? How many characters must change for it to be a new one? This is where it gets murky.

Yahoo owned Flickr choose to solve this problem by letting users choose what copyright they want to impose, and include a Creative Commons option. A very graceful solution though admittedly their situation is much simpler than Twitter’s since they don’t have to deal with complexities like retweeting which would make things very complicated.

WordPress.com isn’t as clear in regards to it’s claims (or lack of) to copyright. Though they are far from locking people in considering you can delete stuff at any time and download your entire blog and move it elsewhere. Matt‘s been pretty open about giving users choice including the ability to leave WordPress.com. There is of course room for improvement to clarify their stance on copyright ownership.

Even Google has been criticized for copyright concerns regarding services like Google Docs.

They could adopt the Richard Stallman stance to “intellectual property” (his airquotes), though that would alienate at least as many as it attracts.

While Twitter might be the hot topic today it’s hardly a problem exclusive to Twitter. It’s an issue for virtually any site out there that accepts third party content. It gets more complicated when content can be remixed and redistributed.

The reality is people should know what rights they are giving up by putting content on these or any other services, but people rarely do. Perhaps a great Creative Commons project would be to create the same simplified icon/license system but for websites that allow users to submit content. The licenses would indicate what the impacts of the Terms of Service jargon are in plain English. It’s essentially the inverse of what they do now. Label the service as well as the content.

So what’s the best solution?

Amazon S3 Outage

The buzz around the web today was the outage of Amazon’s S3. It shows what websites are “doing it right”, and who fails. This is a great follow up to my “Reliability On The Grid” post the other day.

Amazon S3 is cloud based computing. Essentially when you send them a file using their REST or SOAP interface Amazon stores it on multiple nodes in their infrastructure. This provides redundancy and security (in case a data center catches fire for example). Because of this design it’s often though that cloud based computing is invincible to problems. This is hardly the fact. Just like any large system, it’s complicated and full of hazards. It takes only a small software glitch, or an unaccounted for issue to cause the entire thing to grind to a halt. More complexity = more things that can fail.

Amazon S3 is popular because it’s cheap and easy to scale. It’s pay-per-use based on bandwidth, disk storage, and requests. Because that allows for websites to grow without having to make a large infrastructure investment, it’s popular for “Web 2.0″ companies trying to keep their budgets tight. Notably sites like Twitter, WordPress.com, SmugMug and Amazon.com themselves all use Amazon S3 to host things like images.

Many sites, notably Twitter, and SmugMug didn’t have a good day today. WordPress.com and Amazon.com operated like normal. The obvious reason for this is WordPress.com and Amazon.com are much better in terms of infrastructure and design.

WordPress.com uses S3, but proxies that with Varnish. There’s a brief description here, and a more detailed breakdown here. According to Barry Abrahamson, WordPress.com does 1500 image requests per second across and 80-100 are served through S3. They have (slower) back up’s in house for when S3 is down and can failover if S3 has a problem. This means they can leverage S3 to their advantage, but aren’t down because of S3. Using Varnish allows them to keep the S3 bill down by using their own bandwidth (likely cheaper since they are a large site and can get better rates on bandwidth). This also and lets them have this have a good level of redundancy. Awesome job.

Amazon.com uses S3 themselves. If you look at images on the site, they are actually served from g-ecx.images-amazon.com. Which is actually:

g-ecx.images-amazon.com. 38     IN      CNAME   ant.mii.instacontent.net.

instacontent.net is actually part of Mirror Image, a CDN. This is essentially outsourcing what WordPress.com is doing in terms of caching. It’s similar to Akamai’s services. A CDN’s biggest advantage is lowering latency by using servers closer to the customer, which are generally going to feel faster. The other benefit is that they cache content for when the origin is having problems. Because Amazon has a layer on top of S3, they have an added level of protection and remained up and images loaded.

Twitter serves most images such as avatars right off of S3. This means when S3 went down, there were thousands of dead images on their pages. No caching, not even a CNAME in place. Image hosting is the least of their concerns. Keeping the service up and running is their #1 concern right now. The service was still usable, just ugly. Many users take advantage of third party clients anyway.

Using a CDN or having the infrastructure in house is obviously more expensive (it makes S3 more of a luxury than a cost savings measure), but it means your not depending on one third party for your uptime.

Reliability On The Grid

There’s been a lot of discussion lately (in particular NYTimes, Data Center Knowledge) regarding both reliability of web applications which users are becoming more and more reliant on, as well as the security of such applications. It’s a pretty interesting topic considering there are so many things that ultimately have an impact on these two metrics. I call them metrics since that’s what they really are.

Continue reading

Secrets In Websites II

This post is a follow up to the first Secrets In Websites. For those who don’t remember the first time, I point out odd, interesting, funny things in other websites’ code. Yes it takes some time to put a post like this together, that’s why it’s just about a year since the last time. Enough with the intro, read on for the code.

Continue reading