The increasingly popular FriendFeed is proposing a new protocol known as Simple Update Protocol (SUP). The problem FriendFeed is encountering is noting new. They monitor a RSS feeds over a variety of services for each user. This can really add up. To keep things timely they poll them frequently. Generally speaking this is a very wasteful process since the majority of those feeds likely didn’t change. That’s wasted resources. SUP in a nutshell is a changelog for feeds so that a service like FriendFeed can check only the ones that changed. This allows for quicker updates with less polling. Here’s my analysis of the proposal.
- I’m not sure I agree with the authors decision to use JSON as the format. Considering this will be used mostly (if not only) to keep tabs on xml documents (RSS, ATOM mostly) it seems more correct to use XML. Presumably the reason for JSON is that it’s computationally easier to parse. The NYTimes created a database abstraction layer called DBSlayer that uses JSON rather than a binary protocol to avoid the need for a client. The advantage of JSON over xml is JSON is easy to parse, and I agree. JSON also tends to be lighter since you don’t have opening and closing tags surrounding all your data. Still, why introduce JSON to XML world?
- SUP needs index files. Google made a great move with it’s Sitemaps Protocol by allowing for Sitemap Indexes. Essentially it’s a bootstrap that lists several index files. Google further restricted that a Sitemap shall contain no more than 50,000 items and be no larger than 10MB. For a very popular site such as Facebook the SUP feed could become painful to parse as 1 file. Having a SUP Index would be much better. I’d essentially copy Google’s design and rules regarding size, I think they work rather well.
- SUP should allow for both using SUP-IDs or RSS URL’s. SUP could be more useful if it were an index of all RSS feeds in some cases. Having that option in the protocol would make sense and future proof. I’m sure Google wouldn’t mind it.
- Unlike using XMPP which can be tough for a startup to implement (look at Twitter). HTTP is “native” and obvious. It also allows for using things like Gzip encoding to cut down on bandwidth and all other nice things that HTTP allows for. It’s also firewall friendly, something that TCP based solutions often aren’t. XMPP can also be very resource intensive, a supplemental feed really isn’t.
- Would cut down on unnecessary feed polling. As noted by FriendFeed they would still need to poll, but the interval would be significantly higher resulting in less requests.
It’s important to note the difference in computational resources “requests”, “bandwidth”, and “CPU”. It seems a lot of people commenting on the proposal have confused them. This proposal would reduce requests and bandwidth and CPU provided enough consumers supported SUP.
Every time a consumer reads an RSS feed on a dynamic site (assuming no cache exists), the database is being hit to get the latest items. Even if a
If-Modified-Since header is sent with the request, the site still needs to check if there’s something newer than that date. For this reason, using
If-Modified-Since, while conserving bandwidth doesn’t help much to reduce the number of requests or CPU required. SUP works around this by only hitting one feed, and then only retrieving known updated feeds. Clearly
If-Modified-Since and SUP aren’t doing the same thing. They are however somewhat complementary.
To generate this SUP feed, one must essentially do one of two things:
- On a “cron” or some other interval based method query the database for updates and generate a feed. That includes feeds someone may or may not request (think about all those accounts you created on a website and never again visited). Sharded databases may also add more complexity. This can be pretty ugly.
- When a change takes place, for example when a user adds a photo on Flickr, update a table in a database containing the SUP-IDs of changed feeds. Then on interval generate a feed using that one table. Periodically that table needs to be flushed of records no longer needed. Generating the feed off of that table is substantially easier. Of course one can argue if that’s “correct” or not from a data modeling point of view. It’s not really normalized, but than again, how many production databases really are, at least when performance matters?
This isn’t exactly a brand new idea. Six Apart has an Atom based update service, Twitter uses XMPP (though it looks like they may phase that out). LiveJournal at one point had a TCP based system that did essentially the same thing.
Overall SUP isn’t really a bad way of doing things. In a sense, it’s Google Sitemaps for feeds. It solves a problem that today doesn’t have a great solution.