In fairness, it's not exactly the most complex codebase in the universe. You got a list of compromised credentials A. A list of people subscribed to tell you when their email appears in a list of compromised credentials B. When new intersections of A and B appear, send an email saying 'you've been pwned'. If it's more complex than that, he's doing something majorly wrong.
Credential breach website Have I Been Pwned (HIBP) will be going open source, site creator and maintainer Troy Hunt has told the world. The site, at the time of writing, hosts details of roughly 10 billion hacked accounts from 473 separate websites. You input your email address and HIBP tells you whether or not the address …
Friday 7th August 2020 20:46 GMT EvilGardenGnome
At the core, sure, but he has other systems tied into it (password checker and API, for instance). I imagine it's not too complex, but the dust that collects on a personal project is high, as is the anxiety/embarrassment of showing off your dirty underwear. I personally understand wanting to have trusted people look first.
* cautiously eyes own private repos *
Friday 7th August 2020 20:50 GMT grizewald
Fair comment. I also don't see why the sale needs to include any stolen credentials in a useful form. Hash the mail addresses and only publish the hashes. It wouldn't change the ability for the site to tell you if you've been pwned or not.
As to finding anyone trustworthy and dedicated enough to run, maintain and most importantly, update the site that Troy created so that it retains its reputation is probably the hardest part of trying to pass it on.
I'd say that some things people create on the Internet are much like children: once you have given them life, you have an inherent responsibility for them. This responsibility may include giving them accommodation at a hotel (your house) for many more years than you may have expected!
The only hope I'd see for the site is if a truly independent non-profit organisation with the right competence and drive offered to take it over. EFF comes to mind as they already publish quite a few tools to help people avoid some of the more common dangers on the Internet. This kind of resource should be right up their street.
Saturday 8th August 2020 05:32 GMT KorndogDev
no no no
"Hash the mail addresses and only publish the hashes"
NO. Such hashes would be broken in hours. New video cards can generate billions of hash values per second. And email addresses are NOT built from completely random characters, which makes the whole process much easier. Simply brute forcing them with some not-so-clever rules (e.g. string must end with '@gmail.com') is a task for a high school student.
Saturday 8th August 2020 10:16 GMT Charlie Clark
Saturday 8th August 2020 06:30 GMT Bitsminer
If it's more complex than that...
Troy has published several blog posts outlining how he has taken full advantage of numerous content-delivery-networks to reduce the cost of supporting millions of lookups per hour on this database. The intersection of A and B is conceptually easy, making it cheap is not.
Also, data quality has been a big time consumer; the dumps of data
providedstolen by hackers never does meet the expected standard. Pikers.
I expect the governance model to take some time to get right, and he and co-contributors should plan for at least one big failure event.
Saturday 8th August 2020 23:14 GMT Anonymous Coward
Sunday 9th August 2020 09:25 GMT chivo243
Sunday 9th August 2020 21:44 GMT 142
Re: Huh, does it log searches?
> I always wondered about entering your personal info on a site like this
It does require trust, but I think you can usually tell by how they talk about the potential issues. This guy's always been open about that worry, and it's always been clear he actually understands people's concerns in that respect.
> When you search for an email address
> Searching for an email address only ever retrieves the address from storage then returns it in the response, the searched address is never explicitly stored anywhere. See the Logging section below for situations in which it may be implicitly stored.
> Only the bare minimum logs required to keep the service operational and combat malicious activity are stored. This includes transient web server logs, logging of unhandled exceptions using Raygun, Google Analytics to assess usage patterns and Application Insights for performance metrics. These logs may include information entered into a form by the user, browser headers such as the user agent string and in some cases, the user's IP address.
Ok, you still have to trust him that's true, but I've met plenty of people who would gleefully hoard people's data, and they'd never in a million years phrase their lies like that.
It's when they talk vaguely, or that dismiss concerns outright that I'm wary of. I don't even have to go looking into that Genderify outfit to know their privacy statements would have been meaningless waffle, exaggerated promises, or doublespeak...
Monday 10th August 2020 13:59 GMT EnviableOne
Re: Huh, does it log searches?
Troy said he came to realise that a large percentage of the value in HIBP was the trust the community put in him specifically.
From the detail he puts into his blog posts about every change he makes and to explain why he has made each decision, and the fact he is still running this in his spare time on a not-for-profit like basis, combine to increase that level of trust.