There is a major difference between Dropbox and Mega.
I am a Dropbox customer.
Kim Dotcom's comeback cloud storage service, Mega, has responded to criticism about its approach to cryptography and password security after security researcher Steve Thomas (@Sc00bz) released his MegaCracker tool, which cracks hashes embedded in emailed password confirmation links. In a blog post designed to reassure users, …
"Mega was already catching up with Dropbox in daily usage."
That's.... so stupid it's not even worth mentioning. That's like saying increasing a number from 0 to 1 makes it closer to infinite.
If I open a new shop and it sells a single spork in the first week of trade, then I'm 'catching up' with Amazon.
Well, if you're right about Psyx's point then the second example conflicts with the first.
Sales of 1 are effectively zero when compared to Amazon. But they're still a finite distance away, and if I keep increasing my sales by one I will eventually get on par. Adding 1 to 0 makes me absolutely no closer to infinity.
So Psyx's first example is about absolutes. The second about practicalities. They're very, very different things and make very different points. So either sloppiness or hyperbole.
Agreed. I might not have the same level's of security as a Swiss Bank, but then Mega is not intended for the storage of billions and billions and billions of Francs/Pounds/Euros/Dollars.
As long as the resources required to break in are several orders of magnitude more than the value of the encrypted data, it's good enough. And remember, even if there is some juicy stuff on Mega, it's still swamped by crap by a very large ratio (needle in a haystack, etc).
Well I got burned for suggesting they simply chunked the encrypted files and looked for matches on the last post, but seems to me that the other suggestion that they hash the files before they're encrypted isn't going to work either, as mega are saying they're not keeping any keys and user b's key isn't going to decrypt user a's file even if they were the same before uploading.
Here's my optimistic thinking: The majority of the files that get uploaded to a service like this are already tightly compressed, and that the encrypted versions of these files are going to be bigger and a sliver more compressible, and that given the scale they expect to be working with, any dedupe, no matter how minor is going to reduce costs.
here's my realistic thinking: The announcement that they will be able to change your password and re-encrypt your files suggests bullshitting, if all files can be decrypted with a users private key and mega's master key then everything they've said they're doing is possible except the lie about not having access to users files.
Hash before encryption is it. Nobody will know what is in your original and personally created data but the hash matches will allow for reverse lookup of known files. Very small files could be brute-force decoded. It's not great privacy.
Big hashes do create false positives sometimes so there can be data loss. Sure, it's a chance of 1 in an nearly infinitely big number, but the amount of data in the world is nearly infinite too. Math says that a smaller number of bits can't represent all the patterns of a larger number of bits.
You dedupe the encrypted data, so if you and I both upload a 1MB file, and the encrypted versions are exactly the same (significantly less likely than the cleartext versions being identical, I know), they get deduped. And you can raise the chances of successful deduping by doing it on chunks rather than whole files - now our files are different when encrypted, but the third page of (say) 4KB of mine matches the 100th page of yours, so those pages get deduped.
With 4KByte chunks of what is essentially random (encrypted) data, the probability of finding an identical chunk is even less than 1/2^(8*4000) in theory. This is vanishingly (and ridiculously) small. That can't be the technique that Mega uses.
(I realise that in practice, no chunk will be all zeros or all ones, etc; but it will still be a tiny probability.)
A chunk has just as much chance of being all zeroes or ones as it has being any other combination :-)
However, I'm not convinced dedupe works on such random data (which in effect it will be). The key for reconstituting the deduped data would end up being the same size as the original data to begin with.
I've recently deduped 3TB of storage down to 2 bits, just a 1 and a 0. Reindexing it all is going to be a bit of a chore though...
I think that's what the flatulent herbivore was getting at with "Or chunkify(TM) the data such that..." i.e. make the blocksize sufficiently small (a byte or word perhaps!?) and there'll be a good spread of matching blocks in even rather small datasets. As long as the chaining method isn't TOO clever you should be able to use simple heuristics to match the patterns even though the actual data forming those patterns wouldn't match. Bit of a crypto disaster under normal conditions but perhaps advantageous here? Trivial to defeat though... compression or, better, pre-encryption with password as filename leap to mind.
Personally I think it's all a Dotcom get-out-of-DMCA bluff though. In that I doubt he's done anything like that at all... I expect the statement is just there for the lawyers and means something like "if you can demonstrate a match, I'll happily delete offending files (but you never will, 'cos your keys will differ :P)"
"the third page of (say) 4KB of mine matches the 100th page of yours"
This doesn't work. The chances of getting any matches is astronomically small even for 16B chunks. For 4KiB chunks it's as close to zero as you will ever get in any practical measurement situation.
My money is on the impossibility of doing what he claims they're doing. If the encryption method obeys certain constraints it is possible, but those constraints *seem* to imply a trivial plaintext attack revealing the key. Strong crypto algorithms don't have trivial plaintext attacks revealing the key. I look forward to a real cryptanalysis of the claim, but most cryptographers appear to have the same gut feeling as I do. *If* he has cracked this problem, someone in his organization is a genius, and that seems less likely than that he's simply lying his ass off.
The way to de-dupe as suggested above is to run a hash of the original, say SHA-1 at 256 bit length and store it "in the clear" . This does allow an identical file to be matched to yours, and copyright owners could make a rainbow table of all their stuff and detect copies, but, it only takes one bit to be different, say in the metadata, and the hash will be greatly different. This same property will cause any attempt at de-dupe to fail also since it needs bit-identical duplicate files.
Perhaps a method a little like Shazam's - taking a "fingerprint" of the file before encryption, so things that sound similar measure similarly. A corresponding process for video might look a the overall structure of the compressed video, its entropy versus time or something - again allowing similar fingerprints to be matched.
Of course, these approaches allow the rights-holders to trawl through the hashes (that would be extracted under court order) and identify stuff that looks a bit like theirs. Proving it is another matter - for that you need to be given a key, so for it to work as a file-sharer service then Mega must never own the keys, you have to ask the folder owner each time. So what if big copyright set up a load of shill accounts?
You can't hash before encryption for dedupe. That will allow you to identify identical blocks, yes. But if your block is deleted as a dupe, you can't decrypt the other copy as it's encrypted with someone else's key.
Either there's no deduplication and it's in the license agreement just-in-case, or there's a per-block master key which is accessible to multiple users. And if that's the case, your data is no more secure than on dropbox.
Either way Mr Schmitz, convicted fraudster, is talking shit. Which shouldn't come as much of a surprise.
@Fuzz The trick here is that the files are _not_ encrypted with different keys. Each file has a per-file symmetric key which is generated when the file is first uploaded. When the uploader wants to share the file, they share this key using PKI to protect it. Since the PKI transaction is all done client side, Mega have no way of intercepting the per-file key and decrypting the files - but do end up with two files on their system which have the same contents and the same key which can therefore be deduped.
As for the no password recovery - the whole point about this system is that Mega _never see_ the password to a user's master key because it is all generated client side. The fact that they can't do password recovery is actually a good sign here (modulo the entropy issues).
Whatever you might think of Kim Dotcom, I can't help thinking that he's got some smarter people working for him than many of the self-appointed security experts who seem incapable of understanding these basic points...
The only way there could be meaningful deduplication is to use a scheme broadly similar to Freenet and Entropy. You split the file into blocks of equal size, compute the hash of the block, and encrypt the contents of the block with that hash. You end up with a bunch of encrypted blocks, and an equal of hashes of plaintext that can be used to decrypt those blocks. You take those hashes and you encrypt them with the user-provided symmetric key.
So each "file" consists of a number of encrypted blocks and a key chain to decrypt the blocks and glue them back into the original file, and the key chain is encrypted with the password that only the user knows.
The problem, of course, is that it is not beyond a powerful attacker to enumerate all of the files they believe they own copyright to, chunk the files in exactly the same way, compute the hashes, and encrypt each block with the hash. There is a possibility that they could then persuade a judge somewhere to produce a court order that demands that the following specific blocks of cyphertext and all files referencing those blocks be deleted and the owners of the accounts containing the files identified. In other words, a sufficiently well resourced entity could relatively easily identify the files and still issue takedown notices, it would just take more computing resources to do so compared to simply searching the metadata for file names.
Of course, if it were properly encrypted, this couldn't be done - but the data would also be completely undedupeable and uncompressible. If Mega really does use deduplication, I rather expect they might regret it.
Of course, the reality is somewhere inbetween. Mega are unable to decrypt the content, and they definitely don't own the rights to the content, which means that they would have to engage in piracy in order to police the content - so in theory, they might be off the hook for not policing the said content. OTOH, if the well resourced rights owners check the contents and hashes of most of the versions of their content that is pirated, they can provide enough identifying information on the file blocks to issue takedown notices. It shifts the policing burden toward the copyright owners, which is probably all the goal was in the first place. From there on the copyright owners can go after the users as they could traditionally - business as usual.
Thinking about it, Mega would have probably done better if they just kept quiet about the deduplication features.
@Gordan That's one way. The other possibility-- perhaps mentioned in someone else's comment; quite a lot of chaff has been posted with the wheat-- is that deduplication is enabled but effectively applied on a per-user basis.
That is, if we accept that user data is being encrypted with the user's master key, and that only that single instance of the encrypted data is being stored by Mega (e.g. a second copy, encrypted with a Mega-owned key, is not also being stored), then the only *likely* instances of duplication the system will see will come from the user him/herself, either in the form of entire duplicate files or identical data chunks within those files (assuming the data chunks are encrypted independently of each other).
Data savings might be large enough to justify this, if we consider that there is a possibility for users to maintain multiple copies of the same music file (for example), either as identical tracks from different albums or as part of playlists. Yes, I know it is much more efficient to maintain playlists as text files pointing to member tracks, but it's often more convenient to copy the playlist tracks to their own directory. Of course, metadata for the tracks will probably be different-- different album names, publish dates, etc-- so deduplication is only likely if independent encryption of data chunks is performed.
Such deduplication ought to be impossible if Mega truly didn't know the contents of uploaded content, according to critics.
If A+B => X and C+D => X there seems no reason they cannot say X is the same and deduplicate without knowing anything about A, B, C or D.
"Knowing that two files are the same, even without knowing the content, nevertheless leaks information about the data".
Does it leak any useful or usable information though? I suspect not. If it does then surely the fact I have an encrypted file already means I can theoretically know every other file that could encrypt to the same end result.
Given that that the main use case that got the predecessor service shut down was sharing big content's precious assets, it is reasonable to assume that is the main use case of the all new service. If it isn't why so much effort aimed at saying to the law "we don't know what's in the files"?
So you can guarantee a high level of de-dupe efficiency because everybody is uploading the same stuff and knowing what it is, or you can hope for some lesser degree of de-dupe based on a chunk/block level process and remaining ignorant. A smart person would go for the latter, but the thrust of the article is how incredibly naive/dumb/reckless these guys are (no password recovery process, really?). It wouldn't be much of a stretch to think they may be doing something far stupider that does allow one to de-dupe based on the unencrypted content in the interests of saving costs to pay the bail money.
Even if it is de-dupe after encryption, any decent forensic investigator would be able to join the dots by looking at patterns of usage of shared folders/keys stitched together with IP address logs to track the *really* popular stuff being uploaded, downloaded and re-uploaded again. I suspect it would not take too long to provide sufficient evidence for the big content lawyers to have a once-more-round the block with this guy.
Really, the whole thing is the cloudified equivalent of a two-year old covering their eyes and thinking they are invisible because they can't see you. It'd be funny if it wasn't so tragic. No wait, it is just funny.
Does it leak any useful or usable information though? I suspect not. If it does then surely the fact I have an encrypted file already means I can theoretically know every other file that could encrypt to the same end result.
If I made a film and I want to see everyone who has it, surely I just upload a popular torrent version of my own film and let the de duping software flag up everyone else who uploaded the same file to give the feds a basis to start on
or something like that anyway..
You have sharing keys - keys which decrypt just whatever you've shared with whomever you give the key to.
However it doesn't seem to specify anywhere if you can give the same key out multiple times (print it on a website) or if it's on a per user basis (so someone writes a script to do it for you).
You can certainly dedupe encrypted data if it's a copy of the same file uploaded into the same account, but the recurrence of an encrypted block of data of any appreciable size is infinitesimally likely. So either Mega's using encryption that's somehow dedupe-friendly (i.e. insecure), their dedupe feature is just crap, or they know more about your data than they should.
It's little wonder people are deriding Mega's marketing as disingenuous, at best.
"It's probably not even important in the overall scheme."
Their business model relies on a third party uploading files that neither they, nor Mega, have the rights to and then selling Mega users access to those infringing files by the MB. Their previous business got raided. If they want to attract pirates to their new business they need to make them feel secure in the knowledge that they won't be caught should the new business get raided as well. They also need to convince the feds that this time they really don't know if a file infringes somebody's copyright. Hence the 'ZOMG, we have encryption!' spiel.
Assuming they actually de-dupe the data right now of course, it might just be in there to give them the oppotunity to dedupe in the future (however they decided to do it) without getting everyone to re-agree to the T&Cs.
If I were going to be doing a file hosting service of that size, I'd certainly want the oppotunity to save space at some point in the future.
even after encryption your going to hit some duplicates
Not any time soon you're not.
4kB = 4*1024*8 = 32786 bits. Not 32786 possible values, 32786 bits. So basically you're flipping a coin 32786 times, repeatedly, and hoping you get the same pattern of heads or tails on multiple attempts.
the recurrence of an encrypted block of data of any appreciable size is infinitesimally likely
I was thinking that, but if you've got enough data in small enough blocks the odds get better. I guess someone better at maths than me can work out those odds. They might be able to dynamically apply an additional level of encoding to make a file/chunk more likely the same, carry that around as metadata, which could improve the chance of a match.
Dedupe or not; it doesn't make much difference to me as I really don't care how much disk space Mega are using or saving. Maybe they've got it and maybe it doesn't work very well in saving disk space. Not my problem.
The second line of concern arises from Mega's terms of service. These explain that the service "may automatically delete a piece of data you upload or give someone else access to where it determines that that data is an exact duplicate of original data already on our service". Such deduplication ought to be impossible if Mega truly didn't know the contents of uploaded content, according to critics.
This doesn't seem right to me. AFAICS the concern would only be legitimate if Mega is talking about different users uploading the same file. But the same user, using the same encryption key, would generate the same message digest on encryption, meaning Mega could compare message digests of files from the same user and delete one if it is a duplicate. A sensible rule, possibly.
It is most odd that there is all this fuss about possible dodgy content being securely uploaded and stored in Mega vaults and yet there is no hassle at all and no media and security attention paid to the physical equivalent which has possible dodgy goods and ill-gotten riches stored in secretive safety deposit facilities which banks offer to customers with no questions asked.
I just had a very productive day in work setting up ownCloud on a Centos vm.
It is probably not anywhere near as secure as mega or Dropbox but it will only be accessible via vpn. Costs a lot less as well.
Once I'm happy with it I can't see me paying for anything else, even if it has crypto it is still off site and out of my control.
personally, I'm done with this online storage crap. Back when MegaUpload was up, I used it to store my files (Before "Cloud" Storage) and it worked great. I even encrypted my files in case someone D/L them.
But then the guv comes along and states "Hey, this site is being used to distribute stuff illegally, shut it down!"
And just like that, I lost my files.
No big deal, I had the originals, but I was legal, I was the only one accessing my files, so what about me?
Now, someone expects me to do the same thing with "Cloud"? IMO, "Cloud" storage is worse. And what if someone deems it to be a hub for illegal content, does everyone loose access to their legit files like I did?
I think there is a lot of people on here making a false assumption.
That assumption is that the de-duplication feature is designed to save Mega storage resources.
From my point of view it is obvious that the de-dupe is for user benefit, when dealing with 50GB or 500GB of data there is a good chance that you will upload a duplicate file, even more so if you are using it for offsite backups. The de-dupe is to save you the transfer and storage budget of using Mega so that you can script backups and only changed files will be re-uploaded or you can upload your photos directory again and again and not duplicate the data.
It is to drive ease of use for the customers NOT to save the Mega storage nodes on capacity.
1. random() is sketchy IF YOU CAN GUESS THE STARTING SEED. But how would you be able to guess a number on someone else's computer, years after the fact?
2. I fail to see how deduplication is difficult, even accepting Mega doesn't know what the unencrypted data is. And I also fail to see how deduplication "leaks" information about the data.
3. They'll fix this. Not a biggy.
4. That's kinda the point. If Mega COULD restore your password, the critics really WOULD be up in arms!!!! Store your password somewhere else - somewhere it can be recovered if you're worried. 1Password on your iPhone perhaps? The choices are endless.
I'll tell you why:
One day has 86400 seconds. PC timers increment in 10ms intervals. That means 8640000 different possibilities to seed random(). That is log2(8640000)==23 bits of entropy per day at max.
In one year that is log2(365*8640000) == 31 bits.
So, very little keyspace to iterate in the worst case. If you have a file timestamp, it will actually be much less than 23 bits !
Other people, such as Netscape, have burnt their fingers with that. You need to be a better $hill, Kim.
Deduplication requires You Mr Schmitz to have the plaintext and/or the key of all messages. Tell us how you would do it without retaining either plaintext or keys, which is equivalent.
You WILL NOT fix this, as it is impossible. No amount of money will get that done, Kim. You are a sleazebag, a convicted criminal and you should not vent claims regarding crypto. Because your arguments will be shredded in no time.
Your password would be used to decrypt a volume keywhich is used for your files. All changing your password does is to decrypt your volume key with your old password then generate a new encryption for the same volume key based upon your new password. The bits of your encrypted files doesn't need to change. At least that is how tools like truecrypt work.
If you are so incredibly stupid to believe iny ANY corporation's crypto-promises, you deserve what you will get.
As others have pointed out, "de-duping" only works across different users if they can decrypt essentially every message. This is because if they eliminate your copy and link to another user's copy, they need the (supposedly) secret key of the other user to deliver anything useful to you when you access the file. Or they need the other guy's plaintext at the time you do the uploading. So THEY will always need access to plaintext if they want to do any de-duping across users. De-duping for a single user could work if less-than perfect crypto modes (such as 3DES-ECB) were used. RC4 would ALWAYS be insecure for de-duping. Good ciphers normally use CBC for ciphers such as AES, DES, Blowfish and per-file keys for stream ciphers such as RC4.
So - SNAKE OIL.
Here is how you do proper crypto, with very little effort.
1.) Get yourself a copy of GnuPG from www.gnupg.org/
2.) To encrypt, open a command line window: Windows-Key and type cmd.exe, RETURN
3.) run c:\path\to\gnupg\gpg --symmetric c:\file\tp\encrypt.xls
4.) Enter the key (twice). Use a silly phrase of at least 60 characters such as "silly goats eat choclate when it is cold in mongolia and the moon is painted red". DO NOT use phrases out of books. If your opponent is a military, either use a wholly random 128 bit key (create a file full of nonsense, perform an md5 on it and use that as a key), or a key phrase of at least 384 characters (yes, one character of plaintext is about 0.3 bits of entropy !)
5.) To decrypt, run
c:\path\to\gnupg\gpg --d c:\file\tp\encrypt.xls.gpg
Governments don't trust sleazy businessmen and all make their crypto themselves, except for the bozos:
GNUpg is available in source code, for you and the expert you hired, to inspect !
I guess for most people it will be best to
1.) open Notepad
2.) hammer about 1000 random characters into notepad. Do NOT repeat "asdf" 250 times ! Be serious about randomness.
3.) Save file
4.) Perform c:\path\to\md5\md5.exe c:\path\to\gibbierish.txt
5.) Write down the md5 checksum displayed. That will be a high-quality, 128 but symmetric key. Unbreakable even to Yank Intel by means of "brute force" attack. Do not confuse these 128 bits with the length of asynmmetric keys, which need to be longer than 1024 bits these days. 128 bit for symmetric keys is still more than good enough. 256 bit is a waste.
6.) Put the md5 key into your purse, into the little thing you have around your neck or the like. Do NOT put it into the same thing as where you carry the USB stick with enciphered stuff.
Here is where you get md5: http://www.fourmilab.ch/md5/
Of course you can also get md5 and gpg from Ubuntu, cygwin and many more sources. Make sure the source is legit, though. Do NOT use the adware scammer sites.