* Posts by uchu

4 publicly visible posts • joined 13 Apr 2010

Google Percolator – global search jolt sans MapReduce comedown

uchu
Flame

The rub is the resources!

The rub is the resources. I'd like to see the global performance hit of this new crawl method. As the article says "The rub is that Caffeine uses roughly twice the resources to keep up with the same crawl rate."

More instant, more distributed and redundant crawling...that all relies more on the web sites themselves to serve up the same data over and over again to the distributed multi-headed Caffeine monster.

Facebook plugs email address indexing bug

uchu
FAIL

robots.txt is not a security protocol

Just like doc spock relates above.... robots.txt is not a security protocol.

And it's not just miscreants, major search engines reserve some rights to still spider (but not include in their public index) stuff that they are told not to look at via robots.txt. And there's all the silly parasitic bots appearing in the Amazon cloud, goober bots like 80legs, and all the corporate sponsored bots that tend to ignore robots.txt entirely.

Google: botnet takedowns fail to stem spam tide

uchu
Megaphone

Google provids a Free Botnet for Spammers

I believe Google provides a very accessible botnet that spammers regularly take advantage of. Excuse me Google, doesn't look like spam levels have been going doing... but Stock Market, look how many people have signed up for Gmail and all that ad revenue. Who cares if Google generates a bit of spam... as long as nobody actually quantifies how much spam they get that originates from Google, they'll be in the clear.

Google boss tells newspapers he feels their pain

uchu
Go

google is a leech

I'd say google is pretty leech like.

Not only are large publishers effected, but so are small publishers. They try as much as they can to get good search results on Google, willingly stretching to help Google create more ad revenue for themselves... while not sharing in the revenue.

Google's smart. They know they could create a clearinghouse of advertising revenue and base it on the unpaid search results that got clicked on or showed up prominently in the results. Murdoch knows this too.

Or maybe it is time that publishers start quantifying the crawls by Google and charging them per access of a page of content. If Google wants users to pay for content, shouldn't they pay for their crawlers getting it too?

New business model, you bet!