Beating the Algorithm

25-07_altavista_98Twenty years ago, when I first got involved in the web development communities (wow, has it really been that long?), our goal was to game the algorithms of search engines like AltaVista, Excite, WebCrawler and Inktomi’s Hotbot. My own early success on the Internet was largely based on a very intimate knowledge of how Excite, and to a lesser extent AltaVista, ranked their search results.

Things were pretty simple in the Nineties, of course.

Assuming you could get your web pages indexed (and that was a lot harder back then), the only real criteria for getting ranked well was relevancy. The search engine compared someone’s search query to the content of all the web pages in their database and tried to find the page that was the most relevant for the query. That meant web devs had to learn to use the right search terms on a web page, putting those terms in the right places and with the correct formatting. Excite, for example, liked finding search terms in the page footer and all the search engines seemed to give a boost to words that were wrapped in bold tags.

Essentially, we analyzed each search engine’s algorithm and then used that knowledge to manipulate the search results. And, frankly, it wasn’t all that hard to do.

I remember one of the earliest tactics was to steal traffic from popular searches that weren’t necessarily relevant to your web page’s content. If you could rank well for a popular search term like “Michael Jordan” it could easily result in a lot of visitors to your site. So, how do you get a web page trying to sell pink widgets to show up for “Michael Jordan” searches? Easy, you just put a lot of “Michael Jordan” white text on a white background so only the search engines spiders will see it. Instant relevance.

That particular tactic didn’t work for long, but it was just one of hundreds that web developers devised to game the search engines. Every time a search engine closed one door, we found keys to open five more.

It was a war, one the search engines of that era were destined to lose.

AltaVista is probably the best example of that. By the time Yahoo! acquired the search portal in 2003 and put it into mothballs, AltaVista’s search results were largely dominated by spam. Just about anything you searched for would return nothing but links to porn, pills and poker.

25-20_google_98Things changed in 1998, although most of us didn’t realize it until a few years later.

The change, of course, was the official launch of Google. The introduction of Google’s PageRank algorithm (pdf) meant that web pages would no longer be ranked solely by relevancy.

You still had to be relevant, of course, but that alone was no longer enough. Google pulled relevant pages from its database, but then ranked those pages according to a citation notation, a concept popular in academic circles. If someone cites a source they usually think it’s important. On the web, links act as citations. In the PageRank algorithm links essentially count as votes, with some votes counting more than others.

Your ability to rank depends on how many people link to you and how trustworthy those links are.

Google’s algorithm worked surprisingly well, at least for a time. People were actually able to FIND information on the Internet again. Other search engines languished. Even highly relevant, human-edited directories like Yahoo! and ODP passed into obscurity, unable to scale to an exponentially growing Web. Google, soon enough, became a verb as much as it was a noun. It became, and remains today, synonymous with search.

The web dev community, needless to say, didn’t just roll over and play dead. Google’s algorithm was still just an algorithm.

It could be analyzed. It could be gamed.

People started acquiring links to their web pages by hook or crook. Blog networks, guest blogging, article syndication, press releases, paid site-wide links, paid content links, link exchanges or wheels, link bait, vertical directories, link farms, social bookmarking, forum or blog comments, profile spam, domain purchases and redirects — the list of ways we found to get links to our sites was almost endless.

That list has also, over time, become at best almost useless, and at worst, downright dangerous.

Google hires very smart people. They’ve looked at the history of other search engines, engines like AltaVista, and recognized that Google’s survival depends on continuing to deliver good results to searchers. When the company went public in 2004, part of their IPO filing had to include investor warnings. Among other worries, Google specifically cited spam and link bombing as something that could harm its results.

“If our efforts to combat these and other types of index spamming are unsuccessful, our reputation for delivering relevant information could be diminished. This could result in a decline in user traffic, which would damage our business.”

war room 01The war between web developers and search engines is clearly still being waged. This time, however, the web devs are losing. Badly.

Google’s algorithm is not static. From the infamous Florida update of 2003 to the more recent series of Penguin updates, Google has repeatedly dealt the web spamming community blow after blow. For a while, that simply meant that what used to work stopped working and web devs had to come up with new strategies. New players, increased pressure, but still the same game we had been playing since the days of Excite and AltaVista.

That changed, however, in 2007 when Google announced they were going to penalize sites that ran afoul of their SE Guidelines. These days, being penalized or, worse, being dropped from the index entirely for an egregious infraction is pretty much the kiss of death for a web site.

Manipulating the algorithm is still very much possible. It’s not even all that hard. But such manipulation has become a temporary fix that ultimately leads to a permanent loss. The recipe is simple: Find something to sell or promote, create a link mill to game the search engines, then be prepared to throw away all that work when the penalties eventually surface. That might be days or months away, but it is indeed pretty much inevitable.

Long-term success on the Internet now depends on bowing to the will of the all-powerful Google.

Sometimes that can be damn frustrating. Google may be all-powerful, but it is certainly not all-knowing, and all too frequently what it wants us to do is, frankly, stupid. Most of the time, however, what Google requires is in the best interests of everyone involved.

Google’s survival depends on making its searchers happy. It does that, at least in large part, by sending those searchers to web sites that can provide what the searcher came to Google to find. Those searchers, of course, then become YOUR visitors. If you give them what they seek, and do it significantly better than your competitors do, you will naturally gain the citations necessary to rank well in the engines. It’s win-win-win, for Google, for the visitors, and ultimately for YOU.

Of course, that’s often easier said than done. There’s this chicken-and-egg thing where we need traffic to get citations and need citations to get traffic? It’s a building process and, yea, it takes time.

But, today, it’s the only game in town.

NEXT Facebook