Month: May 2005

Sarcasm Found

Post date 5/23/2005
1 Comment on Sarcasm Found

Considering I’m about the most sarcastic person you’ll ever meet in your entire life, I bet you won’t care about this.

I’m curious if that region in my brain is larger than normal.

Monkey Monkey on my window

Post date 5/23/2005
3 Comments on Monkey Monkey on my window

Monkey Monkey on my window
Did I mention I love a good Monkey icon? Grease Monkey is pretty cool regardless of the icon or name. But Grease Monkey just rocks with the name and icon!

Best thing I’ve seen since I spotted this little guy:

Monkeyfox

Be on the lookout for him in a NVU polo shirt.

Apple

Podcasting in iTunes, Intel Inside

Post date 5/23/2005
1 Comment on Podcasting in iTunes, Intel Inside

iTunes is said to support podcasting in version 4.9, set for release within 60 days. Boy is that an endorsement to a movement. Also there’s yet another rumor that Apple may start using Intel processors in it’s products [Mac Observer link]. Apple has been said to have a x86 version of Mac OS X since 1.0, already has an x86 version of Darwin. Not to mention the Xserv RAID uses an Intel chip already. I have a feeling if anyone is moving to Intel anytime soon, it’s the Xserv line of servers, not the consumer side of things. Perhaps dual Xeon’s for those craving them.

Mozilla

Reporter Tool Changes

Post date 5/22/2005
1 Comment on Reporter Tool Changes

In the past week since we initially landed reporter we’ve fixed a few regressions, nits, and made an enhancement (now capturing gecko ID). Here the bugs for those interested:

281714 Not capturing gecko ID
294543 Reporter Doesn’t get the Correct Product Name/Version
294972 Reporter’s back/next buttons behaving badly
295156 Reporter should use rel=”nofollow” to deter spammers
293254 Reporter fails to open privacy policy window on suite
294262 wizard automatically sends the report upon ‘enter’ key is…
294808 Uncheck privacy policy agreement by default
293253 Reporter’s privacy statement checkbox labeled very badly
293251 Reporter should use brand entities rather than hardcode product name

I’d appreciate some testing of new nightly builds (tomorrow). Please file any bugs you find in Bugzilla.

It is accessible from the Help menu and launches a simple wizard that lets you send us feedback about broken websites. It’s not yet a part of the default installer, but you can enable it by doing a custom installation and checking the box in the list of available additions (with DOM Inspector and Talkback.) [Ripped from Asa’s blog]

Google Internet

Is PageRank Dead

Post date 5/21/2005
No Comments on Is PageRank Dead

Google essentially started in 1996 when Larry Page and Sergey Brin started work on”BackRub” and quickly morphed into the Google we know today. The premise of their search engine has been a mythological technique known as “Page Rank”. Which Google briefly discusses:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”

The Original Internet
Now this was designed for a very different Internet. The majority of sites were done by businesses, and educational institutions. Free information was everywhere, subscription services where somewhat rare, rich media virtually non-existent, blogs, personal sites, parody websites virtually non-existent (at least compared to today). The people mainly could give input by going to Usenet. There was a separation of the real, and the unreal on the Internet. There were also mailing lists, and guest books. The occasional web based forum started to POP up towards the later years.

The Internet Rebirth

Free Web Hosts become extremely prevalent during 1998, 1999 and continued through this new age (though some of the less business savvy hosts died). This provided an easy outlet for virtually anyone to create a web presence. Prior you either had to be a large company or be at an academic institution.

Around late 2000, early 2001 there was a rebirth on the Internet. It wasn’t quite as dramatic as the dot com bubble that recently burst, but it was rather revolutionary. It was a sign the Internet was maturing. So many things were evolving. Blogging started to become prevalent, those website like journals who could be written by anyone with little or no fact checking unlike the mammoth websites of before. They could look professional yet be little fact. They also featured comments, the ability for anyone who wanted to leave text, and links on the website typically below the main article.

Also of rising popularity (though existent before) were Internet Forums, thanks to the recent rise of Free Forum software it became possible for more than just a few larger organizations to have forums. Combined with dropping prices for bandwidth and disk space on servers, virtually anyone could have their own forum. Another web based (unlike Usenet) method which is essentially a user editable website where anyone can post comments and/or links.

The last major component of the new internet was the Wiki, while not new, this technology came to rise in 2001 when Wikipedia opened it’s doors and shocked the web. A community of average people taking care of a website. No longer confined to the corners of the web, Wikipedia as of this posting is #48 on Alexa’s list of top websites. Just about all the others are not user editable.

Corporate Changes
While the people were gaining power, the corporations that were kings of the original web also had some changes. Many decided to move some of their content to subscription models. So only paying customers can access the content. NY Times, Time Magazine, WSJ among many others made some content exclusive. The more professional content of the Internet was disappearing behind logins, while the less professional content was becoming more accessible.

Google
Google during this time had a massive shift into mainstream. Google’s indexing technique proved rather well up to this point. It found very relevant information very quickly. While many search sites were selling ranks, Google didn’t. It’s algorithm made the ranks. Google was unbiased in a very bias internet. Google could do no wrong.

Spam
Spammers soon realized how to abuse Google and other search engines. By creating bots that planted their links all over forums, blogs, and fictional web pages created for the purpose, they could essentially boost their page rank and achieve higher status on Google. This low cost method was rather effective for some time. Unlike email spam, this technique targeted search engines, no so much end users directly.

Fight back
Google and other search engines decided to fight back by adopting a new technique of preventing websites from gaining page rank by spamming. Using a HTML tag a webmaster can tell a search engine, these sites weren’t legitimately linked, and don’t deserve the bump in page rank.

Tomorrow
Now with a really concise rundown of the history it’s possible to look at the issues that lie ahead for techniques such as “Page Rank”. How does it compensate for the massive shift in content? It’s a design that was based on a much more honest Internet of organizations that linked to relevant content, not because of boosting page rank. It was an Internet of the privileged who researched, rather than the million+1 blogs of useless text. Now in the advent of rich media (audio/video), how does one even begin to analyze the data?

Does Bayesian filtering play a role? Is it possible to use this spam technique not only to fight off page rank abusers, but to rank based on legitimacy of data?

How does one define legitimacy in a mathematical means (the only language a search engine truly knows). And who defines it. In an age of media corruption as well as infinite bias sources, who defines what’s real or not? How does one prevent a particular ideology from gaining the upper hand in search results? Do a quick Google search for Jew to see how page rank can be abused. #2 on that list as of this posting is what most consider a hate site “Jew Watch” as it calls itself. Is it accurate information? Is it the best of the web? This page, gets a Google Rank of about #189. Is this accurate either? Is it a bias source? Are the 188 before it better in terms of research? I noticed several parody sites, a few that may be considered hate sites, several blogs, mailing list archives, and other user-editable pages. But what did this very basic search give me? I can of course pick a source and do a query, for example Time. This is slightly improved, but I only have access to some of their content. At least an editor has looked at the contents before it was published. Time has decades worth of reputation. But there is still a problem. I lost the key advantage of the Internet. I’m no longer searching on the interoperable network of computers that is the Internet. I’m searching Time Magazine. I’m not getting the most relevant thing of many sources, but the best that Time has to offer.

I choose this query as an example because it was somewhat recently (last year or so) in the news because of #2. But it’s far from the only query that works. Anything regarding politics, social issues, religious ideology, is all tainted data.

Is the Internet broken?
I wouldn’t say it’s broken, but I’d suggest it’s time we go back to our roots. Way back when each academic discipline published academic journals. Works published by academics, which were peer reviewed. In fact they are still done, just few every look at, or notice them. This concept is very relevant on the Internet and can be applied many ways.

Peer Reviewed Internet
This would simply be the option to rate sites you visit when you search, and have that data funneled back into the search engine for future use. Sites people find relevant get a higher rank, sites that stink, get punished. This obviously has some risk of abuse from things like bots, or simply a bias world (just because people disagree with something, such as another political standpoint, doesn’t mean it’s a bad source).

Peer Reviewed Search
This in my mind would be the most promising technique. Putting together a peer organization to rank and approve websites based on their integrity, and allow queries on approved websites. This puts organizations with established standards of journalism above that of blogs and allows web users to do queries based on those. The key to this would be to establish a standard benchmark (such as how information is collected, presented, ownership of information). This would be similar to my example of a query against time.com, but open to all sites that are professional organizations with peer approval.

Blog Search
Technorati is currently my favorite for this. I really wish they would partner with Google as they have very complementary technology. It would be nice to do a Google query of the blogosphere and see how people react to certain things. A query of the blogosphere on “cloning” for example. Perhaps even an algorithm that based on links and text can even give me a summary of the content (such as Weighted Categories says a lot about the Blog itself). A nutshell overview of the blogs ideology, slant, and focus. Based on content, not some meta tag (keywords, description) which can be forged. Nobody is “fair and balanced”, that’s a marketing term not a business practice.

This isn’t a new concept, Google Groups has done this for years already, just Usenet is not mainstream anymore. We’re querying history rather than the present.

Conclusion
In a world of exponential data growth, and a user editable Internet, it’s not possible to give relevant search based on search techniques designed for the Internet of several years ago. The quality of search has dropped significantly, and will likely continue. What is necessary is to break down the information based on its source, and its quality, and allow users to easily search groups based on this. Ideally a search engine would be smart enough to automatically distinguish reputable content and less reputable content. Ideally a search engine would know enough to realize what authors are well respected, and who isn’t. Ideally a search engine could decide the best approach to searching, and use that technique, rather than use 1 technique to index and search.

The search engine wars are on. It will be interesting to revisit this post and see how things have changed in time.

Around The Web Funny

Crying While Eating

Post date 5/20/2005
No Comments on Crying While Eating

Yup, this is officially the end of the internet. Crying While Eating. So wacky and far out there, I really don’t get it. It’s somewhat funny, yet sobering at the same time. What will they think of next?

Mozilla

Wanted: ProductName/Version

Post date 5/20/2005
3 Comments on Wanted: ProductName/Version

Were down to one reporter bug that needs to be fixed ASAP. Were not always getting the correct Product. By product I mean productName/productVersion. For example “Firefox/1.0+” or “SeaMonkey/1.8b2”. The below code is what’s currently being used. general.useragent.extra.firefox is not exactly reliable, even less so than I originally thought. Leaving me with not many options. So my challenge is this: build an unbreakable function (or as close to it as possible) that will return the correct product/version for us to use. Should work with SeaMonkey > 1.7 and Firefox 1.0 and later. nsIXULAppInfo can’t be relied on 100% as it’s not frozen.

function getProduct() {
  // only works on Gecko 1.8 and higher
  if (‘nsIChromeRegistrySea’ in Components.interfaces) {
    return ‘SeaMonkey/’+
    Components.classes[‘@mozilla.org/network/io-service;1’]
              .getService(Components.interfaces.nsIIOService)
              .getProtocolHandler(‘HTTP’)
              .QueryInterface(Components.interfaces.nsIHttpProtocolHandler).misc.substring(3);
  }
  // Firefox < 1.0+
  else if (navigator.vendor != ”){
    return window.navigator.vendor+‘/’+window.navigator.vendorSub;
  }
  // Firefox 1.0+
  try {
    var prefs = Components.classes["@mozilla.org/preferences-service;1"].
                           getService(Components.interfaces.nsIPrefService);
    return prefs.getCharPref("general.useragent.extra.firefox");
  }
  catch(ex) {
    return "Unknown";
  }
}

Funny Mozilla

Master I have mail for you

Post date 5/19/2005
7 Comments on Master I have mail for you

This has been my notification sound on my computer for a few years now when I have incoming mail in Thunderbird (about every minute). Yea it’s silly and rediculus, but you know something: I still find it funny after all this time. My computer is on mute 90% of the time anyway. Not sure were it’s from. Anyway, here it is:

[removed for now, until I fix the audio player]

Mozilla

Reporter is semi-live!

Post date 5/17/2005
3 Comments on Reporter is semi-live!

It’s great after about 10 months since the idea first came up, to see this project actually get somewhere.

Checkout Asa’s blog for the scoop on getting a build. Give it a run, and let us know of any bugs (perfer you enter them in bugzilla rather than a comment).

You can go to the reporter development instance, and query for your report by report ID, or just do an empty search to get the last 25 reports.

The data for the next day or two is going towards the development instance. Once things are moving smoothly, we will point it towards the production database

Personal

Rob Talks Food: HoJo’s shutdown, Pizza

Post date 5/16/2005
No Comments on Rob Talks Food: HoJo’s shutdown, Pizza

Fark just reminded me that the famous Howard Johnsons in Time Square (actually Broadway, 46th) is closing down. A good place for Apple Pie Alamode before a Yankee game. Or well, pretty much anything. Guess I’ll have to go here instead.

Speaking of food: Flipping channels during breakfast I stumbled on Ray Romano appearing on The Tony Danza Show (oh how I hate him). Anyway, Danza brought a few Pizza’s for Ray and Phil Rosenthal to pick from for the finale, since they need a good pizza. Of course when they relvealed the name of the winner, it was Lombardi’s. I’d say that’s a good call. Either that, or a sicilian from Spumoni Gardens.