
THIS is a Register-worthy sub-head!!
Python package pile prefers protecting programmer privacy
PyPI, the Python Package Index, began evaluating ways to reduce the amount of identifying information that it stores even before the US Justice Department came asking for data on suspect users. But now that the code repository has disclosed receiving three subpoenas for data on five users earlier this year, the Python …
No. Salt guards against pre-computed hash tables.
Take each IP address, add salt, hash. Doing that about 3.7bn times for all the actually usable IP addresses will not take long at all. It would certainly be done in well under a minute, less than a second on decent hardware.
Instead of always logging a hashed IP address, their system could log only extremely vague/imprecise information until suspected abuse is detected, which it could do by counting the number of suspicious encounters until a certain threshold is reached. Only if specific abuse thresholds are reached, would the system begin logging progressively more specific hashed data. It could start with vague geolocation info of the ISP (not the user), then the vague ASN range, then the precise ASN, then vaguely related subnet blocks, then more specific subnets, then gradually adding more octets, with the data becoming only more and more precise as the need to have it actually increases in a quantifiable way. Associated data like timestamps should be kept as vague as possible too. For example, if the logs are just to protect against DoS attacks, then the log should be an aggregate count of how many times a given hashed value was encountered over a relevant timeframe, not a precise readout of every encounter with millisecond accurate times.
With this approach, most subpoenaed data would be too vague to be abused, even if successfully bruteforced and most legitimate users privacy would be extremely well protected (regardless of legal disclosures) most of the time.