back to article Security researchers scrutinise search engine poisonings

The techniques used by unloveable rogues who automate search engine manipulation attacks themed around breaking news to sling scareware have been unpicked by new research from Sophos. A research paper published on Wednesday by Sophos researchers Fraser Howard and Onur Komili lifts the lid on the search engine optimisation …


This topic is closed for new posts.
  1. BristolBachelor Gold badge

    The solution

    The solution is for the search engines to harvest from IP addresses that cannot be associated with the search engines. That way, the content cannot be customised depending on if it is Google or some poor user fetching it.

    Would also solve some of the problems with search engine poisning for other things, like google search for products.

    I'm sure that someone like Google could easily arrange to "borrow" IP addresses from large ISPs on a random basis.

    1. A J Stiles

      ..... not quite.

      Search engines don't just use a predictable pool of IP addresses; they also use a predictable user-agent string.

    2. Daniel 1

      Do you really think it could be that easy?

      Hell, if that was the solution, I'd no doubt be using the 'BristolBachelorBot' search engine, today, wouldn't I? The reason why there's only one serious contender, and one wannabe, in this market, is because it's hard.

      Even if the Googlebot did not explicitly identify itself, as such, the spider can easily be recognised, simply by the patterns of its behaviour. For instance, unlike regular web-scrapers, search engine spiders tend to poll their requests at regular intervals over a given period of time, and will avoid requesting certain content (like, for instance, javascript files whose functionality is not, in some way triggered by the page request), to avoid consuming a site's bandwidth: a visit from the Googlebot can easily take half a day, if you have a lot of content. The regularity and nature of the requests can act as a signature.

      Even if those factors didn't alert you that a search engine was on-the-visit, the very fact that it reads your Robots.txt file is a bit of a giveaway. I'm sure you wouldn't advocate search engines stop reading robots.txt?

      Google regularly and deliberately haze the behaviour of their search engines, to throw these people off, but its a constantly moving battle. I really don't think people outside of search, realise the enormity of the problem of automatically gathering realistic data on the Web, these days. We only notice it, when it fails.

  2. Anonymous Coward

    heres one to add to the IP/Hosts blocklist

    heres the IP of one nasty malware virus checker(worm) that seems to crash my browsers everytime it gets refered to by google.

    IP to block

    PeerBlock FTW

  3. Anonymous Coward
    Anonymous Coward

    the real answer

    Is to spoof your user agent as googlebot and hide your referer.

    or just use trusted news sites for your breaking news stories maybe.

This topic is closed for new posts.

Other stories you might like