Facebook has updated its robots.txt file so that the site can only be crawled by a short list of search engines, including Google, Microsoft's Bing, China's Baidu, Russia's Yandex, and a few others. Previously, Facebook's robot.txt allowed anyone to crawl the site, although the company had threatened to sue at least one …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Thursday 1st July 2010 09:11 GMT Ian Ferguson

Search results

Any search engine that excludes Facebook pages from its search results, intentionally or not, gets the thumbs up from me.

Facebook pages are complete drivel, second only to MySpace pages.

0 0
Thursday 1st July 2010 09:14 GMT Anonymous Coward

Crawler beware

Could it not be argued that a search company should make a concious decision before crawling certain sites rather than rely on the presence of a robots.xt file?

especially one that has the wherewithal to hire lawyers (lots of lawyers).

0 2
1. Thursday 1st July 2010 13:24 GMT copsewood
  
  respect robots.txt or welcome to my infinite tarpit
  
  It's entirely possible to tarpit a crawler you really don't like by generating an infinite number of random pages and links for it extremely slowly, tying up its resources for months on end. The robots.txt protocol is so well known that there is no excuse for a site owner not to use it to express policy or for a crawler operator not to respect it.
  
  Given that a crawler which doesn't respect the wishes of a site owner can be tarpitted until it gives up, there is no point suing if a crawler respects robots.txt and a better punishment available if a crawler doesn't, given that all a crawler is doing is making automated use of information you have chosen to publish.
  
  0 0
Thursday 1st July 2010 09:39 GMT Ralph B

Populating Google Me?

Wouldn't it be ironic if Facebook's irresponsible approach to user privacy allowed Google to auto-populate their rumoured rival service with user accounts?

Expect an email from Google soon, starting: "If you, like many others, are unhappy with Facebook, then we're please to tell you that we've already prepared you an account at Google Me, with the same login details, same friends and groups lists ..."

2 0
Thursday 1st July 2010 10:49 GMT Anonymous Coward

Google already have...

their own social networking site, its called orkut, so it *could* happen

0 0
Thursday 1st July 2010 13:00 GMT Anonymous Coward

Don't blame them

I've run a long-tail site, and when I wasn't watching very hard it was brought to a grinding halt by crawler traffic placing 5x the load on the server that the actual user traffic was generating. I ended up blocking the random crawlers as well - half of them were effectively just stealing the content, and most of the rest of them were sending next to no traffic to use anyway

0 0
Thursday 1st July 2010 13:02 GMT John Ridley

It's a start

Now, can I specify a custom robots.txt so that NO search engines can crawl my Facebook stuff?

I suspect I'm OK already since I go in every few weeks and make sure that nobody I haven't accepted as a friend can even tell that I exist on FB, let alone see any of my stuff.

0 0
Thursday 1st July 2010 14:57 GMT Anonymous Coward

Lying scumsuckers

Yes, some of these crawlers are really disreputable. If Facebook doesn't under the counter I'm a goldfish, even before you consider what they've got acting as CEO...

0 0
Friday 2nd July 2010 12:09 GMT El Richard Thomas

Fixed the quote...

"Some sleazy crawlers simply aggregate user data en masse and then sell it, which we view as a threat to our own sleazy business model"

0 0