
"We can confirm that this is not a vulnerability"
OKAY!
A bug in Microsoft's Internet Explorer browser is causing more than 50 million files stored online to leak potentially sensitive information that could compromise user privacy, a security researcher said. The documents stored in Adobe's PDF format display the internal disk location where the file is stored, an oversight that …
"The potentially sensitive data is included in PDFs that have been printed using Internet Explorer."
The explanation in this article is so poor I had to go to the linked site because I couldn't see how printing a pdf could possibly add anything to it. The information is not included in pdfs that have been printed with IE, it is included in web pages that have been converted to pdfs using a pdf printer driver from IE. That's very different to the explanation given in the article.
The number of pdfs that are created by printing a web page will be much lower than the number of pdf that have been printed while viewing them in IE. The poor explanation given will cause a lot more people to think they are at risk of exposing info than actually are. Is this CNet or El Reg?
It could also be pointed out that every MS Office document type (Well the main ones: doc, xls, ppt, and their 'xml' versions, I can't be bothered to find out about say, MS Publishter's files...) tends to store far more than just the last edit path, but also the Windows user name, Active Directory information, printer info, and much more. Not that they're alone, Open Office does similar things, I just don't recall the details as clearly.
Some guys gave a pretty impressive talk about this sort of thing at DefCon 17, but of course I no longer remember their names or the name of their app, which automated the searching, downloading, and metadata extraction. I do remember that it was awesome, even though it only ran on Windows...
Moral of the story: Clean all metadata before releasing anything, be it doc, pdf, jpg, or otherwise.
Hi Dave,
I don't think the explanation is incomplete. It is obvious that if you are using the print function of IE to generate the PDF, then it will be use a PDF printer driver. For my site, I have to include more details as I get much more space to write. So, I think Dan summarized it nicely.
Cheers,
Inferno
How many PDFs I've downloaded where a person's internal computer name in the form of blah.domain.com is stored in there.
I always strip that stuff out. And in Office.
Meta data truly sucks. If people can't name a file properly then they deserve to never find it again.
Apart from EXIF info, that's actually useful and mainly because people don't know/bother/care to edit it
I'm with Dave Murray about the explanation given in the article. It took me a couple of extra thought processes to realise that the "printing" meant converting a web page into a PDF document. For one thing I didn't realise that IE could do this. Or is it just using a PDF printer driver in which case isn't part of the problem with the driver? Also "printing" does tend to mean "onto paper" whilst converting is easily understood as meaning changing the representation of a document.
Contrary to a couple of comments above, I figured out straight away what the Reg article was talking about, and I don't consider myself the smartest kid in class. Perhaps the art of reading is on the wane? ;-) As for metadata, it can be a boon and equally an annoying security problem, the latter especially as it is mostly "invisible". The vast majority of programs that deal with file formats incorporating metadata don't make it easy (for the casual user at least) to view, edit and strip out metadata. We are still a long way off from possessing a proper security-related computer culture. If only this "exposé" would make a blind bit of difference. Sadly it won't.
But that's not as likely to generate lotsb of traffic.
Nothing in this article seems to actually pertail to using IE. Sounds much more like what you get when you send a webpage to ADOBE's pdf printer. My copy of Windows 7 with IE doesn't even have a pdf print option, so it seems odd to think that the bug is in IE.
It looked like Adobe takes the full path name for the default title. The more sensible approach, which is generally taken, is to expect that the file name is the same as the title and drop the directory structure.
Perhaps if there's something that IE does differently that causes Adobe's crapware to fail even more spectacularly than usual, El Reg could enlighten us and show how it works on a number of other browsers But the article doesn't explain anything, instead just gives a vague overview of the problem and immediately blames IE.
Only relation I see is that some E users were stupid enough to pay Adobe for Acrobat Professional.
So people can discover that other people that use Windows occasionally store files in the C:\User Documents\ subdirectory or whatever it is called. What a disaster!! Armed with this knowledge I could travel to the US and, merely by examining, 100 million laptops, identify the one concerned. I would then have to guess the owner's password, gain unfettered access to the machine and then perpetrate my evil plan. It's so simple it's a wonder nobody thought of it before.
This isn't a bug in the slightest, and honestly I'm shock... oh forget it. Anyway, every browser does this - IE, Firefox, Chrome. Any "security researcher" who is actually posing this as a bug and security flaw seriously needs to go back to Xboxes and PlayStations, and stay the heck away from PCs.
@Sven, a better example without the footer is
http://www.oregon.gov/OMD/OEM/plans_train/grant_info/fy2009_hsgp_investment_justification.pdf
@Mat, this does not occur in other browsers such as Chrome, Firefox, etc. Please read the article at my site.
@Anonymous Coward - By default, IE does not have any pdf printer, so for making pdf, you need to install something like acrobat professional, cutepdf, etc. The problem occurs when IE passes local path in the title field while other browsers don't. The bug is also open on the Adobe side as well but much harder to fix since every pdf printer driver from multiple vendors would have to filter the title.
Thank you, AC. Now, the rest of you "experts"...
The machine where the document was printed is almost certainly *not* web server from where it was eventually served up. It is probably running a different OS, recognising a completely different set of users, owned by a different company and hidden behind a NAT.
That's like me telling you that there is someone in the world with a user name of jkl13 and they are running Windows XP, Microsoft Office and IE. Statistically, this is almost certainly true, and the software involved has a truly monumental list of vulnerabilities if it isn't fully patched. However, my telling you this is clearly not a security problem or privacy leak of any kind.
You have to (a) download a PDF printer driver (this isn't MacOS) and (b) print an HTML webpage to PDF using it.
How is this a bug in IE? Printing is designed to go to a printer. If you divert it to a file and then publish the file without even giving it a casual glance first, caveat emptor.
Or is the "bug" that loads of people just don't care and publish the PDF anyway, because after all who gives a toss that you are called "EddieEdwards" on your local machine, given that you're publishing it on "Eddie Edwards' Blog".
Must be a slow week in security.
(PS: if you do care, and think IE + PDF printer driver is a valid conversion tool, go to Page Setup in IE and set "footer" to something other than "URL". Job done.)
Read title ...
It's not Adobes fault, it was handed the URL's as text in the printout
It's not IE's fault the user _ASKED_ it to print out the URL's (yeah it's on by default but you can turn it off)
The user created the PDF with the URL's, then after _VIEWING_ the pdf and noticing the URL's at the top decided to _PUBLISH_ the damn thing.
...
Is it Friday yet?
If that sort of metadata is a major threat then isn't there something very badly wrong with your system security? Lets face it if the document was witten by Fred Smith then its a very good bet that their username will be fred, freds, fsmith or fredsmith, and the profile will be in c:\documents and settings\fsmith etc... Security by obscurity is no security at all, surely this is not new...
I am afraid I am with the others who say this isn't a threat.
This is just exposing a little too much skin on a Saturday night, but it isn't going to lead to an attack. Just because I know the username of a PC that I don't have access to doesn't mean I can do anything. Yes, if there are login details in the URI of the page being printed then that presents a potential problem. But oddly I did a google search for [filetype:pdf http php "&pass"] and it turned up only security and programming documents, nothing actually interesting.
I also think the thought process in the article is jumbled, perhaps like so much documentation it seems sensible to someone who already understands it.
A feature not a flaw (however undesirable).
Hmm - you're not a securty person are you? Whilst this isn't a "massive, world changing, problem", it is something that shouldn't have happened.
ANY information leakage is "bad, m'kay?" - and should be avoided. Yes, this is particular leakage is a minor problem, but for some places (guess why I'm posting anonymously?) the userID is one of the many layers of security.
E.g. if you walk up to my PC, you need to know my ID as well as my password to login - it's not displayed, and it's a randomly generated one so you can't just know it by knowing me.
No, I'm not claiming it's a major hurdle (it's not, really it's not) - but a layered security approach is better than the traditional armadillo.
This is just the user being sloppy about their META tags and is more a user education issue than a 'bug'. I've seen loads of PDF conversion tools populate file paths in the META info like this. The assertion that you can only change them with a text editor is also a load of bull - there's loads of software out there that allow you to set META tags.
Most users don't bother about these things (as you would expect) so I've always made sure that PDF uploads on my websites populate the META tags with info provided from the web form/database...
I guess you could use this as part of a phone or email based social engineering attack; if you can appear to know the filestructure of a users system and can name a document it it, you would gain extra credibility in a claim to be 'that nice guy from support" who needs a little 'maintenance' doing, and if the filepath gives you a clue about the OS, that should help you choose a suitable vector...
T1000, because that's social engineering at its finest ; )
There is a pretty lackadaisical attitude to security in evidence in the comments above.
Several commentards seem to think the issue of metadata in files is a not a security and privacy issue. "Oh, big deal" they sneer.
Of course it is a security issue. And one, as has been pinted out, that applies to the most common filetypes - including those generated by OO and MSO files - and not a PDF-specific thing.
Regarding the point about revealing OS and app vers, I'd be equally concerned about browser headers - for users of Firefox I suggest a look at the ModifyHeaders add-on here:
http://modifyheaders.mozdev.org/no_wrap/help/en/index.html
Quote:
"Apart from EXIF info, that's actually useful ... "
It can be useful, of course. But, equally, it can be very revealing. I tend to strip it if I send photos anywhere. For a start, it's easier to remove EXIF data from an image file than the metadata from a Word Doc or a PDF.
A good starting point is to ask "Does world+dog *need* this or that information?" If the answer is "no", then strip out the information. Simples.
As suggested in another comment, a pathname including directory structure might expose some of the thinking behind the creation of a file. But I think the potential issue of metadata leaking unintended information is much wider than this narrow example.
Consider how the origins of the claimed "weapons of mass destruction" dossier were exposed. This was through old edit data in a plagiarised Word document which hadn't been cleaned prior to publication. This exposed the claimed legal justification for going to war against Iraq as shoddy research. We once gained a negotiating advantage against a PC supplier when purchasing, when they sent us a quote document which showed the same makes and models having been quoted to another purchaser at a lower price in a previous version of the Word document, also exposed using the strings program on Linux. We got the price down by asking if they were willing to give us the same price as they offered this other customer - though we never told them how we found out.
People are now getting cameras with GPS included, which is exposing not just the date, time and make and model of camera used when a photo is taken, but also the geographical coordinates where the photo was taken. In most cases publishing metadata unwittingly is equivalent to "exposing a bit more than intended on a saturday night". But in a few cases this will lead to significant and adverse consequences.
!!It's not a bug!! When you "print" ANYTHING from IE, it titles the job with the URL (so you can identify it on a print queue) and also dumps it into the footer. If it's a "local" file, the URL is naturally the file path. A PDF printer is still a printer, but will take all this information and use it to create the PDF - it'll appear as both metadata (as document title) and also within the doc itself as a footer, just like real paper.
Firefox "just" uses the <TITLE> tag where available for print queue identification. It still dumps the full URL in the header (instead of footer). If there isn't a <TITLE> tag, it will use the URL/path for the document title. Or it does in my PDF printer, just tried it.
This is known behaviour and always has been in IE. I assume Outlook would give too much information also (outbind:// etc). Not to mention, a PDF printer BY DEFAULT will use the local username as the document author.
If anything, this is a fault of PDF printers pretending to be hardware printers in order to work. Anything that's "not printed" should not form part of the document.
One final point, why would I ever want to turn some html doc that I have stored locally into a PDF?
What exactly has this got to do with IE? I think I must have missed something. That search works exactly the same on ALL of the browsers, just tested it and they all return full local file paths. Exactly as you'd expect them to really.
If anything this is a bug/hole with Google. It's just a web search! This is no differently to finding open apache servers using the old "intitle index of" melarky.
While being blaze about security is not the answer, security by obscurity is not the way forward. If exposing a single file path significantly comprimises your security, then you need better security.
In my opinion programs should make it very clear what additional information will be included in an output file and give the user the option of not including it.
Even the version number of the program used to generate the file might be something you'd prefer not to include because, if you're using an unusual version of an unusual program, that might be enough to identify you: if you put files in different places on the web someone could easily discover that they were probably converted on the same machine. Also, an attacker might spot that you are using a version with a known vulnerability.
Having re-read the article several times now, I can't believe how it got past El Reg's editorial efforts. The first result noted in the Google search referenced is for 2006, aging backwards to the next result, 2005.
So at best, nothing for 3 years, and nothing for a year prior to that.
Lets face it this IS a non story.
So that's what &w&bPage &p of &P in the header block and &u&b&d in the footer block of the Page Setup menu do for me? And if I typed "KEEP SECRET" that would show also?
And when I use my PDF printer driver it will let everyone know that I got this series from
"http://www.theregister.co.uk/2009/11/23/internet_explorer_file_disclosure_bug/comments/" and they would know *(**(U&U about me.
I go with @Pirate Dave and @Dave Murray. You have just found one of 14,000,000 features in desktop that ordinary users will never understand.
1 Open IE
2 File>Page Setup
3 Clear header and footer boxes
Oh and this 'bug' would only happen if you first saved the webpage to the local location and then printed it from that locally saved copy and then republished it without taking a look at what you were publishing. And if you're doing all that to publish stuff on the web then you probably got much bigger problems to worry about.
I'm a lot more worried about my users giving out their passwords to strangers on the street in exchange for candy bars than I am about my users publishing PDFs which expose information about the filesystems of their machines, which live behind NAT and a heavy firewall, and which serve nothing at all to the world because only an idiot provides services to the Internet from a Windows XP box anyway.
Yeah, sure, ten years ago when we all just plugged our modems straight into the wall, this might've been a vulnerability which actually mattered. These days, though, it's a pitiful joke, not even fit for a worthwhile example of metadata leakage -- you know, the sort of example that'd incline people to think "gee, there could be a real problem here", instead of "this is supposed to be a vuln? okay, guess only idiots worry about metadata leakage, then..."