My One Hope For Yahoo

Now that Marissa Mayer is at the helm of Yahoo, I have just one hope for Yahoo. That they not waste their time and resources trying to build a mobile phone or social network. They are huge and diverse already, to the point where nobody knows if they are a tech company or a media company. They have an almost 0% chance of winning either and Yahoo has a ton of important work to do if it wants to turn itself around. It would be nothing more than a distraction.

It will be interesting to see what happens. Her first mission should be to fix the culture that killed Flickr. Unless she does that, Yahoo is a lost cause.

How Yahoo Killed Flickr

Gizmodo has a great essay on how Yahoo killed Flickr. I think this excerpt is a pretty good summary:

It was a stunning failure in vision, and more or less the same thing happened at Flickr. All Yahoo cared about was the database its users had built and tagged. It didn’t care about the community that had created it or (more importantly) continuing to grow that community by introducing new features.

It’s worth a read. It’s a textbook example of how not to build/manage a product. Yahoo isn’t in it’s current situation by accident or chance. The upside is there’s a bunch of good lessons for everyone inside and outside of Yahoo here.

Love or hate what Facebook is doing, they are essentially the antithesis. Every decision Facebook makes is seemingly about growing the community and features that increase engagement. There are of course many other ways to fail.

The Internet is a network/community of cooperation. If you forget about the network/community factor, you’ve lost.

Favorite Blogs – Edge Cases Of Intelligence

I decided I would periodically share a few blogs that I’ve found particularly interesting for various reasons. Some educational, some just comical and amusing. I’m not sure how often I’ll do one of these, but it’s unlikely more than quarterly. The particular theme for this edition is “edge cases of intelligence”. Read on and you’ll understand why.

Less Wrong

Less Wrong‘s title is a surprisingly good description of its contents. Its more verbose tagline “a community blog devoted to refining the art of human rationality” is even more so. Given it’s operated by Future of Humanity Institute at Oxford University as you can expect, it’s slightly more intellectual. Anyone with an interest in cognitive science, decision-making, or just human psychology in general will likely find at least half of the posts mesmerizing.

Hot Chicks with Stormtroopers

Hot Chicks with Stormtroopers is also a blog with a very descriptive name. This is one of those websites that you didn’t know there was a need or market until you found it. Now I’m not sure what I would do without it. The mere fact that there is enough content that fits this specific niche to fill a regularly updated blog is in itself just amazing to me. The internet truly has everything for everyone.

Yahoo! Answer Fail

Yahoo! Answer Fail falls in the same family as the more popular FAIL blog but with a specific focus on Yahoo! Answers. I can’t help but read and fear for the future of humanity. I suspect at least half of Yahoo! Answers posters are just jokers, but I can’t help but think that some of these people are really out there. Be warned, if you visit this site you will spend no less than an hour, and likely more reading through some of the saddest examples of humanity out there.

Yahoo Traffic Server Open Sourced

Way back in 2002 Yahoo acquired Inktomi who was largely know for their search products. Their software powered some early search engines like HotBot in the pre-Google days. One of their lesser known products was something called Traffic Server. Even if it was lesser known it was still used by ISP’s including AOL, who in those days was big. Their business disappeared with the great bubble and they were acquired by Yahoo, who was using Traffic Server themselves ever since.

Fast forward to 2009. Yahoo is now in the process of opening up Traffic Server as an Apache project. It’s already in incubator. Yahoo says it’s capable of 30,000 requests per server. Noteworthy is that this runs on generic hardware.

These days most websites use either Squid, Nginx, Pound or Varinish on the open source side. On the proprietary side there’s Citrix NetScaler, Foundry (now Brocade) ServerIron, Zeus ZXTM or F5’s Big-IP. The proprietary side can be either expensive software running on generic hardware or an appliance (which is generally a Intel based server with a custom modified Linux install for low maintenance and top performance).

At this point it’s apparently not 64-bit and doesn’t have native IPv6 support. However it appears to be usable and likely competitive with some of the other stuff out there already. Yahoo has been using it all along, and I hear they are pretty popular (problems aside).

It should be noted that commercial CDN’s aren’t really an alternative for reverse proxy or load balancer since they still require a robust and redundant origin. If anything they will reduce your requirements, not eliminate them.

Given everyone’s interest in scaling computing quickly and cheaply this is pretty noteworthy open source event. It tends to be an afterthought but these applications can be critical. Squid handles 78% of Wikipedia’s requests. Given all their traffic, you can see how it matters.

It will be interesting to see if a community builds around Traffic Server and if it sees adoption.

Phorm’s UserAgent

There’s a fair amount of controversy regarding Phorm a company who plans to target advertising by harvesting information via deep packet inspection. They are already in talks with several ISP’s. I’ll leave the debate over Phorm from a user perspective for someplace else.

They claim to offer ways to let websites opt out of their tracking but it’s a true double-edged sword as they don’t play nice with a standard robots.txt file. Take a look at what they are doing here:

The Webwise system observes the rules that a website sets for the Googlebot, Slurp (Yahoo! agent) and “*” (any robot) user agents. Where a website’s robots.txt file disallows any of these user agents, Webwise will not profile the relevant URL. As an example, the following robots.txt text will prevent profiling of all pages on a site:

Rather than use a unique user agent they are copying that of Google and Yahoo. The only way to block them via a robots.txt file is to tell one of the two largest search engines in the western world not to index your site. This seems fundamentally wrong.

There is an email address where you can provide a list of domains to exclude, but that requires intervention and updating a list of domains when you create a site. This obviously doesn’t scale.

Now I’m curious. Is piggybacking off of another companies user agent considered a trademark violation? From what I understand they aren’t broadcasting it, just honoring it. If I were Google or Yahoo I’d be pretty annoyed. Particularly Yahoo since there are websites who will just block Slurm given Google’s dominance in search. Yes there are many user-agent spoofing products out there (including wget and curl), but nobody to my knowledge is crawling web pages for a commercial purpose hiding behind another company name.

robots.txt is a somewhat flawed system as not all user agents even obey it (sadly) though it’s one of the only defenses without actual blocks that exist.

Yahoo! Web Analytics

It went somewhat unnoticed, but Yahoo! today announced it’s Yahoo! Web Analytics package which is intended to compete with the wildly popular Google Analytics. I’ve spent quite a few hours in analytics packages over the years ranging from very amateurish to enterprise grade. Google Analytics is a very good product but it does have limitations. The biggest limitation is the lack of real-time reporting. Google Analytics takes a few hours, making it for most people next-day service. This isn’t a big deal for some, but if your in an environment where you need feedback on your content ASAP (a must for media sites), this is a huge deal. Yahoo is promising to deliver “within minutes”:

Get detailed reporting within minutes after an action occurs on your website. Quickly identify dips in key site metrics or monitor the performance of new content. Seeing the impact of website and marketing changes immediately makes it much easier to optimize them. Yahoo! Web Analytics also maintains historical data so you can go back at any time to review old data for new insight, or compare the present to the past without any changes to your page tags.

Interesting. I wonder if this will light a fire under Google’s butt to deliver real-time analytics as well. Urchin wasn’t really designed for real-time data. Google’s obviously done a lot of work with it to build Google Analytics. I wonder if that’s the next step for them.

Zimbra Desktop

Yahoo owned Zimbra released the latest Zimbra Desktop today. At a glance it seems pretty nice. Essentially Yahoo Mail running on Mozilla Prism. It does seem somewhat of a large download for what it is. But maybe they still have some fat to trim. What is now Firefox was pretty hefty when it first split from Mozilla App Suite. It takes time. The installer is also very slow. I see it has jetty, so looks like there’s a Java backend.

It supports any POP3 or IMAP account similar to Thunderbird, with options for Gmail and Yahoo Plus in the wizard (for those who don’t know what type of email account those are).

My general impression is pretty neat, but the UI needs work. It often has scroll bars to view the contents of a window (just like a webpage). This is normal in a browser, but just feels strange in what is designed to be like a client side application. Even setup has this problem.

So far I still think Thunderbird and Apple Mail provide a better desktop experience. But Zimbra’s the new kid on the block, so I wouldn’t underestimate it. It is Open Source. It will be interesting to see who contributes to it.

If anyone else tried it, I’m curious to know what you thought of it.

Rebreaking The Web

It’s happening again. Once upon a time, browser vendors started adding their own features without consulting with each other and agreeing upon standards. What they created was a giant mess of inconsistencies across browsers and platforms that is still in effect today. Ask any web developer and they can tell you of the pains that they have suffered trying to make seemingly trivial things work everywhere consistently. It’s no easy task. Before IE 7, even an ajax required something along the lines of:

var httpRequest;
if (window.XMLHttpRequest) { // Mozilla, Safari, …
    httpRequest = new XMLHttpRequest();
} else if (window.ActiveXObject) { // IE
    httpRequest = new ActiveXObject("Microsoft.XMLHTTP");
}

That’s right, IE 6 didn’t support the native xmlHttpRequest object (more here). This is just one of many examples in JavaScript and CSS. document.all anyone?

The end result of this problem became to be known as the “Web Standards” movement. Simply put it’s an idea that code should follow a standard that results in consistent output across all browsers on various platforms. Write once, run anywhere. While it’s taken years for this to manifest, it’s slowly become a reality. Firefox, Safari, Opera have fairly consistent rendering (at least in comparison to the mess of just a few years ago on the browser scene. IE 6 was fairly poor in terms of modern web development, but IE 7 made progress, and IE 8 is Microsoft’s greatest effort to date to bring their browser up to speed.

Continue reading

Geek Reading: High Performance Web Sites

So I decided to do a little book shopping a few weeks ago and one thing I purchased was High Performance Web Sites: Essential Knowledge for Front-End Engineers (affiliate link). At its core is essentially a 14 step guide to making faster websites. I don’t think any of the steps are new or innovative, so anyone looking for something groundbreaking will be sorely disappointed. I don’t think the target audience has that expectation though. It’s still a rather practical book for any developer who spends a lot of time on the front-end of things.

It gives many great examples on how to implement, as well as suggestions based on what some of the biggest sites on the web are doing (including Yahoo, the authors employer). I found it pretty helpful because it saves hours worth of research on what other sites are doing to improve their performance. For that reason alone it’s a worthwhile book to checkout. For each rule there’s enough discussion to help you decide if you can implement an improvement on your own site or not. Most sites are limited by their legacy systems such as cms, processes (including human) and audience in what they can actually do. Unless you’ve got a serious budget, you likely fail rule #2 (use a CDN) right off the bat. Regardless there’s likely a few tips you can take advantage of. It’s also a very fast book to get through.

Most steps are pretty quick to implement provided they are feasible in your situation. Overall one of the best “make it better” tech books I’ve seen regarding web development. One of the few that actually appeared worth purchasing (and I did). The majority of the tips require a somewhat tech savvy approach to web development, the book isn’t oriented much towards web designers (with the notable exception of reducing the # of requests by using CSS and better use of images) or casual webmasters. It’s important for those who understand the importance of HTTP headers, but could use some help deciding on best practices, and those who want to know how the small things can add up.

Interestingly enough, I learned about the book by trying the YSlow extension which essentially evaluates a page against the rules suggested in the book. Interesting from a marketing perspective I guess. Overall this blog evaluates ok (about as well as it ever will considering I’m not putting it on a CDN anytime soon). Though I guess I could add some expires headers in a few places.