Thursday, June 21, 2007

Remembering Yahoo

Yahoo didn't start out as a search engine. It was a directory. I remember browsing it in 1994 while attending the masters program in Digital Libraries at the University of Michigan's School of Information and Library Studies. It was interesting, but barely useful. There weren't that many websites to catalog back then. Worse, few of them were experimenting with Mosaic's ability to show images. Sure you could search the directory, but you were forever dependent on the catalog librarian to control how things were tagged and labeled. It was old school.

Meanwhile, various others were attempting to overcome the limitations in the search software of the day. People had designed good systems for searching through thousands of documents, like you found in many corporate databases of the day, but researchers realized quickly that searching millions of documents was a completely different type of problem. Simply parsing text quickly to find the set of documents that match the query isn't enough when the return set itself has millions of documents.

With millions of documents in your index a number of other factors come into play.One problem was keeping up the index. If thousands of people are creating web pages every day, how to keep up? This problem was a primary focus in the mid 1990's. The general solution was to design semi-autonomous robots to scour the web for content. Some like to picture these agents as spiders or crawlers, jumping from link-to-link connecting each section of their semantic web. Alta Vista was the leader in this for a while, but eventually everybody good do it well enough and the focus shifted.

People with a question don't like to wade through the weeds to get to an answer. It doesn't matter if the perfect web page exists. If it doesn't show up in the first 10 or 20 results, it might as well be invisible.

The folks at Excite were some of the first to really focus on this for the web. The problem was well known and they advanced the work of Gerard Salton
who developed the SMART information retrieval system at Cornell in the 1960's. SMART's use of vector space models to cluster documents led the way to improved ways to rank documents based on relevancy.

This is much harder than it sounds. Language is tricky. SMART would cluster documents that shared the same vocabulary and word frequency. If you found a document close to what you wanted you could conduct a new search that returned all the documents that SMART thought was part of that documents cluster. It was pretty neat and the way search is done today still benefits from this heritage.

Meanwhile, Yahoo dabbled in search but its biggest emphasis was on building the ultimate directory. There were/are large bureaucratic layers of editors and organizers and such. Fees to pay if you wanted faster service, and so forth. In fact, when Google started getting some traction in the late 1990's, Yahoo outsourced its search engine results to Google!

Whither Yahoo

Gord Hotchkiss shares his perspective on the recent return of Jerry Yang to the helm at Yahoo in Yahoo Yang = Google (Page Brin)?

He has some insightful things to say about the leadership styles of the major players. Sergey Brin and Larry Page come off as the consummate micro-managers. But much like Bill Gates their collective brilliance is able to give back to the organization far more than their meddling takes away. If it helps them understand what really makes things tick, a little code slinging or architecture tweaks by the founders is a small price to pay.

Jerry Yang and David Filo, on the other hand, did not stay intimately involved at Yahoo. You can't blame them for cashing out early. They never have to work again so it only follows that they would work on things that interest them, and I doubt that search engines are their first love. As Hotchkiss points out, just the opposite is true at Google, where the search experience is a sacred cow, the center of everything they do.

This is a fateful moment for Yahoo. As long as the search experience remains the dominant factor in where people go on the Internet, Yahoo is in trouble.