"[A]s the authors note, the attacker has to be able to visit the same Web pages as the target, and has to be able to capture the victim's traffic."
So the NSA, basically.
HTTPS may be good at securing financial transactions, but it isn't much use as a privacy tool: US researchers have found that a traffic analysis of ten HTTPS-secured Web sites yielded “personal data such as medical conditions, legal or financial affairs or sexual orientation”. In I Know Why You Went to the Clinic: Risks and …
So this is where an attacker profiles a target website, going through it and recording the document requests and the number and size of the data requests being returned. In other words something like "going to page X triggers Y separate requests of a particular size". The number and size of the resources requested are likely to differ between pages and the pattern of page progression will also indicate the page on the site, such as a user will typically follow a pattern of page visits because that is how the site is designed.
Clever enough stuff, but it does require that the site is already profiled, probably extensively and a few times... and no doubt regularly in case the site makes changes. This does limit the vector on this approach quite substantially.
The fix, of course, is to make either the page progression vary (pissing off users and making the website hard to use) or to vary the number and size of requests for each page in a site wide randomisation plan. If the website always produces, eg, 25 requests for each page and they are a consistent size then it'll be impossible to track the page progression.
> thats not really what i use HTTPS for, its protect the actual data i enter into forms
Suppose you indicate you have cancer and the page flow then goes on to ask you about a lot of details on that. The traffic might give that away. And any other amount of detail you choose to enter later, provided the pattern of traffic the app generates is to an extent dependent on your entries.
This is an important analysis, although I suspect that the main giveaway is that I visit the site at all (which HTTPS does nothing to hide). If I visit lend-me-money-at-extortionate-rates.com then I am probably having a financial crisis, and if I visit cancer-information.com then there is an increased likelihood I have a serious illness.
But it is important to know that this fairly obvious theoretical attack is actually quite feasible and gives quite high accuracies. It is a useful data-point to feed into the work on the next versions of the protocol to minimise what can be achieved with this approach.
"thats not really what i use HTTPS for, its protect the actual data i enter into forms"
That's because you're clever and you know what https is for, it's a good guess though that since you're a Register reader you work in IT or are an IT enthusiast. The problem is that there are a lot of people out there who are not at all IT savvy, but perhaps think they are. These people KNOW that https is secure and secret and trust it implicitly. They see the little https icon on their browser that tells them the site is secure and they automatically believe it.
Oh yes, because that's the only vulnerability here.
From the paper's abstract:
We present a traffic analysis attack against over 6000 webpages spanning the HTTPS deployments of 10 widely used, industry-leading websites in areas such as healthcare, finance, legal services and streaming video. Our attack identifies individual pages in the same website with 89% accuracy, exposing personal details including medical conditions, financial and legal affairs and sexual orientation.
It's just possible there's a little scope there to cause some people some distress.
That said, I think what's really important here is:
- A much better result for HTTPS traffic analysis than any previously published ones (they claim; I haven't checked).
- Good suggestions for blinding HTTPS traffic to thwart this kind of traffic analysis (though sites could easily do this today simply by varying content more).
- A possibly interesting approach to clustering and normalization. I've only skimmed parts of the paper, and I only dabble in this area, so I have no idea how novel this application of Gaussian distributions actually is; but it looks good. (Like maximum-entropy Markov models, it's one of those feature-vector approaches to classification that makes it really easy to incorporate, or remove, heterogeneous features. So they do their Gaussian clustering of "burst pairs" to create their initial set of features, and then they expand the vector by throwing in packet sizes as well, because why not?)
Biting the hand that feeds IT © 1998–2021