Blekko Identifies Over a Million Domains as Spam

Spam2_2In what appears to be a hot and nasty brawl brewing between David and Goliath, a tale of two search engines is getting significant press about respective plans to fight spam by removing spam-laden sites from search results. It has all the trappings of a prize-fight: in this corner, the search engine behemoth Google, weighing in at several billion dollars; in that corner Blekko, the relatively unknown challenger, new to the scene but poised to take on its opponent with what Blekko calls, “the first search algorithm ever created to find spam rather than rank results.”

Blekko, the nascent search engine that launched last November, announced last week that it has identified over a million web domains as spam and blocked them from its search results. Utilizing a technology that Blekko calls its AdSpam algorithm, the move could have tremendous implications, at least for the users of Blekko, which reports a million queries a day and about a half million users each month. Rather than adopting Google’s method of lowering the rank of suspicious sites in its search results, AdSpam instead takes a scorched earth policy by identifying sites that are laden with ads and light on content, and blocking them altogether

The move of blocking 1.1 million domains has the direct effect of removing potentially hundreds of millions of spam pages, an achievement of which Blekko CEO Rich Skrenta is tremendously proud. “Domains with low quality content plus keyword ads are ‘machines that print money,’ Skrenta has been quoted. “If you make a machine to print money, people will exploit it.”

According to Blekko, AdSpam is “a machine-learning algorithm that examines pages for a specific spam signals — the presence of multiple display ad positions on a single page and thin to zero content. Unlike algorithms used by other search engines, AdSpam is being used in conjunction with human curation to detect [Spam and] continue the War on Spam.”

What makes Blekko unique is its search method utilizing slashtags to pinpoint search and minimize spam results. By targeting content farms that push spam (like eHow.com and answerbag.com), Blekko has managed to provide what it feels is the path to “better search results…by using an algorithm that was created to kill spam, not just crawl it.”

This latest development is just another foray in a war that both Google and Blekko have committed to fighting. “In the past, our efforts to clean-up search have included our partnership with the Stack Overflow community,” states the Blekko blog, “and our public banning of the top 20 sites most users marked as spam at Blekko.” What remains to be seen is what both engines have up their sleeves. According to The New York Times, Skrenta hasn’t been squeamish about calling out Google. “Google didn’t actually take anyone out, they just reshuffled the deck. Instead of demoting these sites to No. 5 or No. 7, we’re just throwing them out.”

It should be stressed that Googoliath hasn’t exactly been sitting on its hands. In the past several months there has been a public backlash on the deteriorating quality of Google’s search results. The company has responded with series of remedies, including updated search algorithms and the penalizing of low quality sites like content farms. In fact, RCR Unplugged reports that the recent ‘Panda’ update to Google’s algorithm caused such a swing in page rankings that how-to site Mahalo had to lay off staff almost immediately.

Most recently, Google reports that it’s adding a ‘Block All Results’ option that will live right next to the ‘Cached’ and ‘Similar’ buttons, so that users can choose to weed out the spam that seems to have mastered the art of worming its way into the top ranks of Google. Even though Google’s blog talks about this functionality as if it’s already here, there’s no indication if or when it will become active – a search performed while this article was written revealed no ‘Block’ link – please feel free to leave a comment if you’ve seen it in the wild.

Admittedly, there are inherent problems with Google’s proposed solutions. First, it’s difficult to identify an entire site as spam simply from search results. Also, sites like Mahalo will suffer from algorithms that box up their criteria in a way that may misidentify legitimate sites. For example, Google’s new algorithms are based on the consistency of content – a site that focuses on one topic, like healthcare, will probably not be flagged whereas a generalized site with content based on a variety of topics may suffer the wrath of the giant G. By its own admission, Google states “generally low quality” of content as a reason to block something. For sites which rely on user-generated content by nonprofessional writers, this could end up being a troubling trend.

So who has the right formula? Bing and Yahoo haven’t really entered the fray as of yet, perhaps waiting to see what Google has to say on the whole matter of spam sites. Blekko, on the other hand, has chosen to lead and not to follow, a move that could greatly benefit the company as searchers seek alternatives to the millions of results being passed back to the end user. By being proactive, they certainly seem to be taking the war to the content farms and the unending battle between search engine and spam.

Leave a Reply