Scraping websites for malware: Ethics of misreporting?

This topic was created by Anteaus .

  1. Anteaus

    Scraping websites for malware: Ethics of misreporting?

    Recently my hosting company was sent a takedown notice from a German firm named csyscon SIRT. Seems they reckoned they'd found malware on one of the sites we maintain. As a result all of our sites on that host were offline for several hours.

    Turned out the file they had targeted was a zip download containing an executable compressed with UPX. Now, any coder familiar with UPX will know that it is somewhat prone to false malware alerts. Which in fact is why we don't use it anymore. The download was an old version, retained only in case anyone still needed it. The online file checked-out as byte identical to our archived version. Unsurprisingly, the alert was a complete false alarm.

    Since the file was an archived version linked only from an obscure page, I think we can assume that it was found by way of a deep scrape of the site.

    A while later, got a notice from Google that my Adwords account had been suspended 'due to malicious content' on an advertised site. The report was very unspecific. I had to grill the Google staff to find out exactly what the report was about. I had expected it to be the same UPX-compacted excutable, but no. When I finally got the lowdown it referred to a different site anyway, and the alleged malware included one javascript file, one gif image, and one stylesheet. The javascript file was exhaustively tested and found clean. As for the other two, the chances of those filetypes containing malware are, as far as I know, extremely slim. (If anyone knows of a way in which a stylesheet can be infected, that would be an interesting separate topic. I daresay it might be possible but it seems extremely unlikely)

    I pointed out that all of the files checked out as clean, but a Google rep inisisted that they have 'a specialist internal system' which is so good that it can find malware where no other detection system can. Hmmph.

    Judging by other reports, this kind of uninvited scraping of websites for malware seems to be spreading. Companies have sprung up which specialise in it. I think it may be assumed that the scanning is done by bots, and that few if any human checks are performed to catch false positives.

    We've had heated discussions on ElReg before about whether it is ethically OK to scrape third-party websites looking for security flaws such as harvesting risks. However, in that case the objective was purely to inform the site's owner of the vuln, a comparatively harmless response. Even so, some commenters did not think it was acceptable to scrape sites for any reason.

    Here, the objective is entirely more controversial; scraping sites with the aim of sending a takedown notice to the hosting company, or to searchengines to have the site blocked or de-ranked. I would have thought that such activity would be considered malpractice even were the scraping and detection system one hundred percent reliable. What we have seen, though, is that quality control is nonexistent, that false positives abound, and those blunders lead to loss of revenue and costs for the victims of the misreporting.

    Perhaps this could be a subject for a Reg article.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon