
as long as they don't make a hash of it...
I'll get my coat now..
Facebook's security bods routinely trawl public "paste" sites for email addresses and passwords stolen from its users, as part of an effort to outfox wrongdoers trying to hack into personal data on the free content ad network. However, the Mark Zuckerberg-run company was at pains to point out that the data-slurping battle with …
Yeah, yeah, I impress easily (uh huh). Screen-scraping with a will and some thought put in to the user notification (whether to bother the poor saps) system. It's the logical follow-up to owning the users' information, and thus the user, so good call.
Icon: occasionally even the Dark Side can have a good idea.
Warning! Speculative armchair analysis follows...
This sounds like a good idea in theory (especially for a nice little blurb for marketing to put on all their glossy bumf), but how well does actually work "in the wild"? I am certainly no Donald Knuth, just a humble blue-collar programmer that works day-in and day-out with parsing gibibytes of client data in all sorts perverse, ad-hoc formats (Base64 encoded PCL in XML in a CSV field was one of my favourites), but I honestly cannot see this working on anything but the most hopelessly naive script-kiddies.
How does Facebook's programmer team expect to parse and extract user names/email addresses and passwords from the data? I can't imagine the data thieves storing their database dumps in well-formed XML with a schema to validate against. Do they (Facebook) naively expect the thieves to post the data in a consistent "<USERNAME>,<PASSWORD>" format?
Does Facebook actually go through and check every email address and password combo it finds? What is stopping someone from flooding the paste sites with bucketloads of random email and password combos to make Facebook's security spider grind to a massive slowdown? If Facebook somehow filters emails that are not part of its network, what would prevent someone from pasting millions of random passwords for each valid email that they have stolen?
What happens if the thieves gain a modicum of sense and decide to obfuscate the data? Post the emails and passwords as separate pastes for starters. What about other encodings? Rot13? Base64? Ascii85? UUEncode? EBCDIC? What about when the professionals take over, slap the script-kiddies upside the head for being a ninny and pasting plain text data, and start encrypting the goods with PGP/GPG and ASCII-Armour'ing it?
This, I'm pretty sure, is merely scratching the surface. Doubtless other clever commentards can think of more deviously creative ways to throw a spanner into the works. This whole plan may look good on paper, but I just can't see it being very effective beyond a token measure...
You're making the assumption that the people who post passwords on pastebin are making any effort to hide them at all. I don't think that's generally the case. If they wanted it to stay secret they wouldn't put it on pastebin in the first place.
Rather they probably fall into one of two categories:
People who people who stole passwords for the lulz, dubious glory, or to give a certain company a black eye and wish to publicly display their trophy.
More serious cybercriminals who steal a bunch of passwords and post a fraction to prove they've got the goods before trying to sell the rest.
> people who stole passwords for the lulz, dubious glory, or to give a certain company a black eye and wish to publicly display their trophy.
> More serious cybercriminals who steal a bunch of passwords and post a fraction to prove they've got the goods before trying to sell the rest.
Good point.
That was a bit of what I was driving towards: I just cannot see this as being more than token security theatre on Facebook's part to add a a shiny bullet point on their stockholder's reports.
1. The real, serious criminals won't put the information so publicly available.
2. The lulzy glory-seekers can easily obfuscate the data to avoid quick, automatic detection (and easily create gibibytes of "chaff" for Facebook to sort through for additional lulz).
3. Facebook will notify you that your credentials have been compromised, but as infosec-oakton mentioned below, you will not find out until after login, and after Facebook first identifies and verifies your credentials parsed from one of the many paste sites. Wouldn't using a well-designed, in-depth IDS system on the servers be better fit for purpose?
Maybe I am missing something?