Now we know..
..why Google has a free public DNS. So much for blocking cookies, DNT headers and NoScript. The bar stewards are still watching you.
Domain-name lookups only reveal websites visited, not individual pages viewed, right? Wrong: the interaction between a user and the DNS is more revealing than previously believed, according to a paper from German postdoc researcher Dominik Herrmann. In work published at pre-print server Arxiv (in German – thank you, Google …
..why Google has a free public DNS. So much for blocking cookies, DNT headers and NoScript. The bar stewards are still watching you.
To be honest, I'm surprised that this is reported as news. That's been an obvious one for, umm, pretty much as long as the Internet exists. In my (long gone) younger years when I worked for an ISP I even had zone transfers secured because in those days it would have given away who our customers were (these days that's a default, but we're talking long ago, when USENET was still usable).
But, even if you didn't know that you should have had an inkling that something was up if Google offered it for free.
And then prevent any internal clients from talking to Google's DNS.
Which breaks media boxes, televisions, all manner of IOT devices and software apps, practically all media streaming apps that enforce DRM, eg. Netflix, etc.
Blocking these hosts effectively without causing failures (perhaps I should say, to prevent an impact) is not trivial even if you have the infrastructure in place to do this across your network.
And even then, they'll be watching your samrtphone, which if it has a third party app installed or is an android might be behaving most promiscuously with many of the Google inquisition's global public (if not private) nodes.
Use your ISP's DNS
Oh yeah, the assholes that hijack DNS failures to serve ads, causing SAMBA and other scripts to crap out.
They also have pretty much given me an assigned IP. Neither bouncing my cable modem nor releasing my DHCP lease gets me a new IP. I've had the current one for a year now.
Sigh. So all US ISPs have been given carte blanche to be crap. That is, as I said, a separate problem.
Your ISP already and inevitably (because there is only one wire out of your house) gets to see all your traffic, so there is no new opportunity for a privacy breach and if DNS data depends on where you pick it up from then the DNS is technically broken, so ... The advice was sound. Your ISP is, technically, a good place to look for DNS services.
I still don't understand how analysis of DNS records for my IP address can reveal that I looked at en.wikipedia.org/wiki/Alcoholism (or whatever). Are they saying that Wikipedia's responses are different in such a way that the page can be distinguished from other pages?
Yes. Most of the images don't come from wikipedia itself, for example, but from the Wikimedia Commons (another domain, another lookup). How many pictures does that one page contain compared to other pages on the 'Pedia? What distribution of 'Pedia/Commons requests are made?
Put simply, because of all these side requests, just one page can create a fingerprint that can be combined with other pages to create a distinct trail. And unlike what the article says, many of us have longer-term IP allocations (otherwise, home servers don't work so well). Worst part is that this sniffing is all done via basic Internet protocols; trying to mask them will require changing the protocols which may not be efficient or even possible.
OK, I understand how the browsers DNS lookups might in theory help form some kind of clue.
Now my recollection of how DNS works in the real world is that there's potentially quite a lot of caching between me and (e.g.) my ISP's DNS server (assuming that's the one I use). So 'my' DNS lookups visible to the outside workl at my ISP may or may not match the DNS lookups in an uncached world.
I'm ignoring the possibility that someone's been bodging about with DNS-related stuff I nominally control. If they can do that they likely have easier and more effective ways of snooping on me than this idea.
[And then for a different approach to killing this idea, e,g, by data poisoning, there's stuff like Trackmenot or logical successors].
What's wrong here? The 'research'? The article? My limited understanding of DNS?
[edit: big_D seemingly has a similar train of thought at the same time as me]
Now my recollection of how DNS works in the real world is that there's potentially quite a lot of caching between me and (e.g.) my ISP's DNS server
Actually, you'd be surprised how little caching there is between a user and the caching resolver they use - and many routers will default to handing out the ISP supplied DNS resolvers to internal clients.
Form reading it, it's clear that this technique will instantly lose most of it's potency once you are separated from the client by a decent cache - hence some suggestions to run your own internal DNS cache/resolver. If you do that, then unless you set your ISPs resolver as a forwarder for your local resolver, they would have to sniff traffic to get your DNS queries - and they will be vastly less useful due to the caching.
I'd guess the pix in particular may well be near unique to each Wiki page.
2 good rules of thumb are
a) If Google supplies it how does it allow them to extract more knowledge about you (because if Google supplies it it always will)?
b) Don't use Google.
I still don't understand how analysis of DNS records for my IP address can reveal that I looked at en.wikipedia.org/wiki/Alcoholism (or whatever). Are they saying that Wikipedia's responses are different in such a way that the page can be distinguished from other pages?
No, the issue is that a page is very rarely just local HTML. You will have scripts, fonts (again a Google hit, which is why we avoid Google fonts - and thus do not run Wordpress), images (which could have very meaningful titles in themselves), etc etc - each of which is likely to require a lookup that cannot be served out of cache.
In short, you're looking at another data source to feed Big Data based profiling.
Come on guys, this ain't the BBC, where're the technical details?
You rope us in with a headline about supposed privacy leaks in DNS, and then spend the entire article talking about old-hat browser fingerprinting & behavioural analysis. That was news 15 years ago!
“Many websites produce a so distinctive DNS retrieval pattern” that requests can be recognised “more or less unequivocally.”
How does the content on a *website* produce a distinctive enough pattern to identify specific pages?
"IT?" 'cause who the freud do you think your readership are?
It's explained fairly well near the end.
Each page has links, any links to another domain will require dns. If you know what links are on say 500 pages then as someone reads those pages their dns will fire in predictable patterns and you can guess where they are likely to be.
Although host-based blacklists on the client would befuddle things somewhat. Maybe.
Reminds me of Rainbows End when the avatar is randomizing his response delays so that they can't "geolocate" via timing patterns ;-)
Another way would be to have a collection of DNS servers configured locally that get round robin'd for each request, since profiling requires combining the pattern of DNS lookups from specific pages.
That, or if you're feeling like a real crazy cat, use an ad blocker and VPN.
"running a client on your computer that makes DNS queries and sending page lookups to random (legitimate) Web sites in the background will confuse the trail?"
You might possibly think that, I couldn't possibly comment.
See e.g. the Trackmenot browser extension. Only been around since 2006 or so. Surprisingly few people know about it.
1) install DD-WRT on your router
It has a caching DNS server built in and running by default, so you won't keep looking up DNS names you recently resolved downstream over and over again, where your ISP (or whoever) can get at the patterns to figure out where you've been.
Honestly, every router vendor should include a simple caching name server. Then not just the small minority who reflash their routers can benefit, and ISP DNS servers will see far less traffic.
As always, those stupid enough to reconfigure their PC to use Google's DNS servers deserve what they get.
"If your router can't do it, then set up an old PC on the inside to act as your DNS resolver, probably a better bet than using the router DNS cache, long term"
Isn't this one of the things that PiHole does, with less space and power than either a router or a PC?
https://pi-hole.net/
(edit: I see others have also suggested this particular solution already. Must trype faster.)
Since ISPs have to use dynamic IP addresses to cope with the IPv4 address shortage, a user's address changes, making it harder to track them over time.
ER, what? The days of dial up modems are long gone squire, everybody is on;lune 24x7 these days, so you need as many IP addresses as there are customers.
There is no logic to using dynamic IP addresses for most ISPs.
Ehh, enforced IP address changes by German ISPs are silly enough as is. (Every midnight, the ISP actually cuts your connection and waits for the modem reestablish it!)
Forcing the change *hourly* would be mad. Imagine all your downloads failing, SSH connections freezing, VoIP calls cutting, and so on – as soon as the clock ticks to :00.
And besides that, most people use the DNS resolver hosted *by* their ISP by default – the same ISP which assigns them the IP addresses in the first place. So forcing IP addresses to change would be just theater, it wouldn't prevent the ISP from correlating the DNS log and the IP log whenever they wanted.
Guess it's time for yet another use for a Raspberry Pi:
Use your raspberry Pi as a DNS cache to speed up your internet
So Cache DNS, Add some sort of DNS based Ad blocker to the system too, so reduces the DNS lookuups per page quite a lot? Then randomize DNS lookups across a range of root DNS Servers.
Aside - My ISP gives me a "dynamic" IP which hasn't changed in 4 years! So why should I pay them more for a static one?
This post has been deleted by its author
Can't use DNS blocking since some sites MUST be whitelisted (my credit union ended up on the blacklist because it's connected to the government, being a MILITARY credit union). Plus as noted, to deal with this problem, you need a more persistent DNS cache.
Run your own ISC BIND server internally and use Response Policy Zones to blacklist all known advertising/tracking sites so all the crud on the web page will never be resolved.
Even if you run your own DNS Server remember it still has to go out and perform the initial query which is then cached for the TTL of the DNS record. So it is visible upstream if the ISP is performing DNS traffic recording. What will be more difficult is to determine how many times you access an individual page on the web site.
> Aside - My ISP gives me a "dynamic" IP which hasn't changed in 4 years! So why should I pay them more for a static one?
Having a dynamic (or better yet, NATed) IP address is a great privacy advantage for both residential and business users. This does not exclude the option of also having a fixed address used for inbound requests.
In my configuration, dynamic NATed IPv4 is used for browsing and most other activities, while fixed IPv6 is used for inbound and a limited number of outbound connections (mostly SSH).
"Use your raspberry Pi as a DNS cache to speed up your internet"
Careful with the instructions at that link, for they contain the lines:
server=8.8.8.8
server=8.8.4.4
which are Google's DNS servers.
P.S. In the past few years I've come across a lot of people who should know better recommending Google's DNS servers. Even in the workplace.
Okay, having run a DNS server in the distant past, I fail to see why it is necessary to log all the requests. Seems like a massive waste of disk space.
Or am I missing something? (My question is aimed at normal DNS providers, not people interested in deep analysis of your browsing habits to serve you better ads).
The problem is, today, that those hosting the DNS are also interested in deep analysis of browsing habits, generally speaking.
In the US, the ISPs have just received the right to sell any and all information gathered about their users, so DNS logging and patterning would make a nice little earner, to bolster profits.
Disk space is cheap and selling browsing habits is lucrative.
Set up your own DNS server and cut them off at the pass.
There seem to be several potential technical spoilers to this theory.
1) Going to the DNS for every request is inefficient. Caching for some period would be assumed to be standard. That period could be assumed to be more than a few minutes.
2) ISPs with IPv4 using NAT may have several external IP addresses on multiple load sharing boxes. The ISP users' browser connections can be multiplexed on any of those IP addresses by dynamic source port assignments. A different destination IP address will have to open a new connection. There is no guarantee it will be multiplexed onto the same external IP address as previous ones from that user - even in the same session.
Didn't we already know this though? How else would Google Analytics allow you to see which actual pages people had visited, with so much detail about them? And that's from the outside... wouldn't we expect to be able to get everything off the DNS from inside?
(Although I suppose this may explain why I was met with blank looks five years ago when I asked Infrastructure to pull a less detailed version of this info off the servers for marketing purposes...)
Didn't we already know this though? How else would Google Analytics allow you to see which actual pages people had visited, with so much detail about them?
GA doesn't use DNS analysis. The webmaster places a scrap of JavaScript on each of their pages they want usage figures for, so it's the client which directly tells GA what they're looking at and for how long.
... luckily you can turn that (s)crap off by blocking GA scripts in your browser, using browser extensions that block scripts, if your browser does not do it natively, which they all should- unless they explicitly gained your conscious agreement to wholesale and ongoing activity theft.
But with no limits on script sources and pay-load, these protections are in no way fool-proof.
> The "spy" would have to know the content and links from each page to pattern match your request and identify if you visited it.
The premise is that the spy doesn't care about which page you visited, all it wants is to associate your IP to you via a DNS-based fingerprinting approach.
No, changing your IP won't deter the snoops for very long.
Quote: "However, Herrmann writes, someone with access to the infrastructure can easily watch a user's behaviour while they have one IP address, create a classifier for that user, and look for behaviour that matches that classifier when the IP address changes."
"The fix is simple: turn your modem on and off again to get a new IP address. Or ask your ISP to assign them more often"
How do you defeat against your own ISP recording your browsing history.
'List of authorities allowed to access Internet connection records without a warrant'
very simple: use a VPN provided from another country, ideally one without odious retention policies.
Don't use the PPTP protocol as its pants in security, ideally use OpenVPN. Then check the VPN is doing its job by visiting one of the test sites (such as ipleak.net or check.ipredator.se etc)
But as others have pointed out, using DD-WRT or similar on your router plus ad-blocking will go a long way for this particular attack. You can even buy routers pre-configured with DD-WRT and VPN in there so all of your home devices get privacy (not too cheap though).
But can you REALLY trust those VPN providers to actually have the servers located in the countries listed AND not talk to Five Eyes on the sly?
And some of us can't use ad-blocking domain lookups due to false positives.
"But can you REALLY trust those VPN providers to actually have the servers located in the countries listed AND not talk to Five Eyes on the sly?"
In any absolute sense - no
But the probability that they do honour the privacy guarantee is much higher than the probability of my ISP preserving my privacy.
Also I don't really have much to fear from the "five-eyes" style of secret service spying, but I do have much to consider if I end up in some dispute with some petty local bureaucrat who can access my web history and I can't access theirs. That is the whole point - to reset that asymmetry in power that the snooper's charter provides.
I use this service: - https://dns.watch/index
No logging, no filtering etc...
I have suspected Google DNS to be a security/privacy risk for a long time, especially if coupled with your search history.
Also the Pi will cache, but still needs to perform lookups for sites you've not previously visited. Which would then default to your ISP or Google depending on your setup so its still bad.