Favorite Blogs – Edge Cases Of Intelligence

I decided I would periodically share a few blogs that I’ve found particularly interesting for various reasons. Some educational, some just comical and amusing. I’m not sure how often I’ll do one of these, but it’s unlikely more than quarterly. The particular theme for this edition is “edge cases of intelligence”. Read on and you’ll understand why.

Less Wrong

Less Wrong‘s title is a surprisingly good description of its contents. Its more verbose tagline “a community blog devoted to refining the art of human rationality” is even more so. Given it’s operated by Future of Humanity Institute at Oxford University as you can expect, it’s slightly more intellectual. Anyone with an interest in cognitive science, decision-making, or just human psychology in general will likely find at least half of the posts mesmerizing.

Hot Chicks with Stormtroopers

Hot Chicks with Stormtroopers is also a blog with a very descriptive name. This is one of those websites that you didn’t know there was a need or market until you found it. Now I’m not sure what I would do without it. The mere fact that there is enough content that fits this specific niche to fill a regularly updated blog is in itself just amazing to me. The internet truly has everything for everyone.

Yahoo! Answer Fail

Yahoo! Answer Fail falls in the same family as the more popular FAIL blog but with a specific focus on Yahoo! Answers. I can’t help but read and fear for the future of humanity. I suspect at least half of Yahoo! Answers posters are just jokers, but I can’t help but think that some of these people are really out there. Be warned, if you visit this site you will spend no less than an hour, and likely more reading through some of the saddest examples of humanity out there.

Yahoo Traffic Server Open Sourced

Way back in 2002 Yahoo acquired Inktomi who was largely know for their search products. Their software powered some early search engines like HotBot in the pre-Google days. One of their lesser known products was something called Traffic Server. Even if it was lesser known it was still used by ISP’s including AOL, who in those days was big. Their business disappeared with the great bubble and they were acquired by Yahoo, who was using Traffic Server themselves ever since.

Fast forward to 2009. Yahoo is now in the process of opening up Traffic Server as an Apache project. It’s already in incubator. Yahoo says it’s capable of 30,000 requests per server. Noteworthy is that this runs on generic hardware.

These days most websites use either Squid, Nginx, Pound or Varinish on the open source side. On the proprietary side there’s Citrix NetScaler, Foundry (now Brocade) ServerIron, Zeus ZXTM or F5′s Big-IP. The proprietary side can be either expensive software running on generic hardware or an appliance (which is generally a Intel based server with a custom modified Linux install for low maintenance and top performance).

At this point it’s apparently not 64-bit and doesn’t have native IPv6 support. However it appears to be usable and likely competitive with some of the other stuff out there already. Yahoo has been using it all along, and I hear they are pretty popular (problems aside).

It should be noted that commercial CDN’s aren’t really an alternative for reverse proxy or load balancer since they still require a robust and redundant origin. If anything they will reduce your requirements, not eliminate them.

Given everyone’s interest in scaling computing quickly and cheaply this is pretty noteworthy open source event. It tends to be an afterthought but these applications can be critical. Squid handles 78% of Wikipedia’s requests. Given all their traffic, you can see how it matters.

It will be interesting to see if a community builds around Traffic Server and if it sees adoption.

Phorm’s UserAgent

There’s a fair amount of controversy regarding Phorm a company who plans to target advertising by harvesting information via deep packet inspection. They are already in talks with several ISP’s. I’ll leave the debate over Phorm from a user perspective for someplace else.

They claim to offer ways to let websites opt out of their tracking but it’s a true double-edged sword as they don’t play nice with a standard robots.txt file. Take a look at what they are doing here:

The Webwise system observes the rules that a website sets for the Googlebot, Slurp (Yahoo! agent) and “*” (any robot) user agents. Where a website’s robots.txt file disallows any of these user agents, Webwise will not profile the relevant URL. As an example, the following robots.txt text will prevent profiling of all pages on a site:

Rather than use a unique user agent they are copying that of Google and Yahoo. The only way to block them via a robots.txt file is to tell one of the two largest search engines in the western world not to index your site. This seems fundamentally wrong.

There is an email address where you can provide a list of domains to exclude, but that requires intervention and updating a list of domains when you create a site. This obviously doesn’t scale.

Now I’m curious. Is piggybacking off of another companies user agent considered a trademark violation? From what I understand they aren’t broadcasting it, just honoring it. If I were Google or Yahoo I’d be pretty annoyed. Particularly Yahoo since there are websites who will just block Slurm given Google’s dominance in search. Yes there are many user-agent spoofing products out there (including wget and curl), but nobody to my knowledge is crawling web pages for a commercial purpose hiding behind another company name.

robots.txt is a somewhat flawed system as not all user agents even obey it (sadly) though it’s one of the only defenses without actual blocks that exist.

Yahoo! Web Analytics

It went somewhat unnoticed, but Yahoo! today announced it’s Yahoo! Web Analytics package which is intended to compete with the wildly popular Google Analytics. I’ve spent quite a few hours in analytics packages over the years ranging from very amateurish to enterprise grade. Google Analytics is a very good product but it does have limitations. The biggest limitation is the lack of real-time reporting. Google Analytics takes a few hours, making it for most people next-day service. This isn’t a big deal for some, but if your in an environment where you need feedback on your content ASAP (a must for media sites), this is a huge deal. Yahoo is promising to deliver “within minutes”:

Get detailed reporting within minutes after an action occurs on your website. Quickly identify dips in key site metrics or monitor the performance of new content. Seeing the impact of website and marketing changes immediately makes it much easier to optimize them. Yahoo! Web Analytics also maintains historical data so you can go back at any time to review old data for new insight, or compare the present to the past without any changes to your page tags.

Interesting. I wonder if this will light a fire under Google’s butt to deliver real-time analytics as well. Urchin wasn’t really designed for real-time data. Google’s obviously done a lot of work with it to build Google Analytics. I wonder if that’s the next step for them.

Zimbra Desktop

Yahoo owned Zimbra released the latest Zimbra Desktop today. At a glance it seems pretty nice. Essentially Yahoo Mail running on Mozilla Prism. It does seem somewhat of a large download for what it is. But maybe they still have some fat to trim. What is now Firefox was pretty hefty when it first split from Mozilla App Suite. It takes time. The installer is also very slow. I see it has jetty, so looks like there’s a Java backend.

It supports any POP3 or IMAP account similar to Thunderbird, with options for Gmail and Yahoo Plus in the wizard (for those who don’t know what type of email account those are).

My general impression is pretty neat, but the UI needs work. It often has scroll bars to view the contents of a window (just like a webpage). This is normal in a browser, but just feels strange in what is designed to be like a client side application. Even setup has this problem.

So far I still think Thunderbird and Apple Mail provide a better desktop experience. But Zimbra’s the new kid on the block, so I wouldn’t underestimate it. It is Open Source. It will be interesting to see who contributes to it.

If anyone else tried it, I’m curious to know what you thought of it.

Rebreaking The Web

It’s happening again. Once upon a time, browser vendors started adding their own features without consulting with each other and agreeing upon standards. What they created was a giant mess of inconsistencies across browsers and platforms that is still in effect today. Ask any web developer and they can tell you of the pains that they have suffered trying to make seemingly trivial things work everywhere consistently. It’s no easy task. Before IE 7, even an ajax required something along the lines of:

var httpRequest;
if (window.XMLHttpRequest) { // Mozilla, Safari, …
    httpRequest = new XMLHttpRequest();
} else if (window.ActiveXObject) { // IE
    httpRequest = new ActiveXObject("Microsoft.XMLHTTP");
}

That’s right, IE 6 didn’t support the native xmlHttpRequest object (more here). This is just one of many examples in JavaScript and CSS. document.all anyone?

The end result of this problem became to be known as the “Web Standards” movement. Simply put it’s an idea that code should follow a standard that results in consistent output across all browsers on various platforms. Write once, run anywhere. While it’s taken years for this to manifest, it’s slowly become a reality. Firefox, Safari, Opera have fairly consistent rendering (at least in comparison to the mess of just a few years ago on the browser scene. IE 6 was fairly poor in terms of modern web development, but IE 7 made progress, and IE 8 is Microsoft’s greatest effort to date to bring their browser up to speed.

Continue reading

Geek Reading: High Performance Web Sites

So I decided to do a little book shopping a few weeks ago and one thing I purchased was High Performance Web Sites: Essential Knowledge for Front-End Engineers (affiliate link). At its core is essentially a 14 step guide to making faster websites. I don’t think any of the steps are new or innovative, so anyone looking for something groundbreaking will be sorely disappointed. I don’t think the target audience has that expectation though. It’s still a rather practical book for any developer who spends a lot of time on the front-end of things.

It gives many great examples on how to implement, as well as suggestions based on what some of the biggest sites on the web are doing (including Yahoo, the authors employer). I found it pretty helpful because it saves hours worth of research on what other sites are doing to improve their performance. For that reason alone it’s a worthwhile book to checkout. For each rule there’s enough discussion to help you decide if you can implement an improvement on your own site or not. Most sites are limited by their legacy systems such as cms, processes (including human) and audience in what they can actually do. Unless you’ve got a serious budget, you likely fail rule #2 (use a CDN) right off the bat. Regardless there’s likely a few tips you can take advantage of. It’s also a very fast book to get through.

Most steps are pretty quick to implement provided they are feasible in your situation. Overall one of the best “make it better” tech books I’ve seen regarding web development. One of the few that actually appeared worth purchasing (and I did). The majority of the tips require a somewhat tech savvy approach to web development, the book isn’t oriented much towards web designers (with the notable exception of reducing the # of requests by using CSS and better use of images) or casual webmasters. It’s important for those who understand the importance of HTTP headers, but could use some help deciding on best practices, and those who want to know how the small things can add up.

Interestingly enough, I learned about the book by trying the YSlow extension which essentially evaluates a page against the rules suggested in the book. Interesting from a marketing perspective I guess. Overall this blog evaluates ok (about as well as it ever will considering I’m not putting it on a CDN anytime soon). Though I guess I could add some expires headers in a few places.

Yahoo Goes Green

Yahoo is going carbon neutral. I’m curious how much is offset, and how much is reduction. Yahoo has a fairly large infrastructure. I wonder if they are using alternative power sources, or if they are going to plant a million trees. They do mention:

These projects could include a wind farm in India or a small-scale run of the river hydroelectric project in Brazil. We’re also looking to invest in emerging clean technologies.

Interesting. I wonder if we will see things like carbon neutral VoIP, carbon neutral bandwidth, carbon neutral data centers / colocation / hosting?

Google Zeitgeist 2006

Google Zeitgeist 2006 is out. Along with an explanation on how the data is compiled on the Google Blog:

…we looked for those searches that were very popular in 2006 but were not as popular in 2005 — the explosive queries, the topics that everyone obsessed over….

It always proves to be an interesting bit of year-end data to look at. Yahoo on the other hand keeps things a little more up to date with the Buzz Index, also a very good read. For example the impact of President Ford’s death in searches. Very cool data. Hopefully one day Google will do something similar. I’d love to see how their audiences compare on current events as they happen.

Yahoo TV Redesign

I noticed this the other day. The Yahoo! TV redesign is aweful (my personal opinion, maybe some like/love it). And yes, we all know I’m a TV addict. Once upon a time I was a TVGuide.com user, but switched because it was to slow and cumbersome. Now once again I may be hunting for something better. Here are some of the problems I feel are pushing me away:

  • Need To Sign In To Get Correct Listings – Before I didn’t need to be signed in, it just saved a cookie with my prefs. This keeps my account more secure (separate sign-in) and still lets me quickly glance at my listings. I noticed for the past month or two even that wasn’t working very well. It seemed to forget my settings. Now I need to login or stay logged in (or get generic listings).
  • URL’s Should Be Forever – If they live short of that, use a redirect. The listings used to be http://tv.yahoo.com/grid it’s now http://tv.yahoo.com/listing. Breaking bookmarks is taboo on the web. Especially big pages like that which are very bookmarkable.
  • Abuse Of AJAX – It feels as if ajax was used only because it looks cool and trendy. It’s unnecessary. They should load the whole grid at once. Each section of the grid seems to be 8 -10 channels long. And one request for each section. The ajax response isn’t slim either, it’s raw html (likely inserted with innerHTML for performance reasons as DOM is typically slower). Now to scroll to channel 63, I need several of these requests. It’s slows things down despite being ajax and technically asynchronous.
  • Hard To Tune Navigation – The time navigation is hard to accurately pinpoint, making the old pulldown list of times in :30 intervals much easier to use. This is driving me nuts. I want 8:30!
  • More Clicking, Less TV Watching – Clicking on a link does this drop down effect. Only then can you get full show info (unless you open in new tab/window). 2 clicks where it used to be only 1.
  • “Info” Page Is Like A Splash Page – That info page is also very slow, unlike the old one which was very lean and fast. Rather than have upcoming episodes of the show listed as well as credits (good for seeing guest stars on The Simpsons), it’s filled with a lot of useless info (show ratings, reviews, link to buy DVD, promo photo’s, news related to the show). To get the good stuff (detailed info on show, credits, upcoming episodes)… yea all separated onto individual pages now. Lots of clicking.

I also hear of browser issues (Safari), but haven’t tested myself so I won’t go into that.

Now this doesn’t make much sense. They presumably went to ajax to make common tasks require less pageloads and increase usability and fluency of the site (click show title for more info), but instead I think it causes users to load more pages to view info they are accustomed to. The side effect on all this is that it’s slower for users who want some info on what’s on. I understand the need to monetize the pages, but at least make it worth clicking on.

Apparently they are working on things, and even addressing feedback and they deserve serious credit for that. It would be wrong to not credit them for this effort.

By the way: I wouldn’t mind turning off the “Record To TiVo” buttons. I don’t own a TiVo. Perhaps ask for the users pref when getting their TV listing info? Maybe that’s just me.

There does seem to be alternatives. TV.com and Zap2it.com as well as my employer’s site ShowBuzz (see, I do disclose relationships).

Needless to say, as a web developer and a TV addict, this redesign was very interesting from my perspective. I’ll be keeping an eye on it.

Disclaimer: Obviously, this post is my opinion only and does not in any way reflect the opinions of my employer.