Deduplication, how?
If all data is encrypted with different encryption keys controlled by the Mega users and not Kim&Co. (as I understood how things work here), how can they possibly use file based deduplication - efficiently, that is?
Kim Dotcom's new cloud file locker, mega.co.nz, has all-but-failed to appear online, with its mastermind claiming global enthusiasm for the site has overwhelmed its resources. But The Reg can report the site has been flaky since shortly after its launch, when the press-only login we were sent did not work. Regular attempts to …
By hashing.
Compute the hash before bothering to encrypt and upload.
Of course this also provides a means by which "THEY" can establish whether your encrypted content matches known files (only if exact match).
Interesting as this provides a way that actual user generated content always remains private, but (exact) pirate copies can still be detected.
Of course rather easy to fuzz music and video content to defeat this, but probably plenty of people wouldn't.
Haven't you heard?
If your product launch isn't FUBARed in some way then nobody notices it. Fuck things up and get column inches from everywhere. Google Nexus 4 and Apple Maps are recent examples, it just shows you Kim has his finger on the pulse.
Although I have no real opinion of the guy I'll be cooking up the popcorn for this one. Having the cash to rent a huge mansion and live the lifestyle he does might be enough for some people to quit once they get raided by gun toting law enforcement officers but I'm pleased to see Kim's balls are in proportion to the rest of his body when it comes to dealing with government agencies and mega corps.
For reference, my attempt to register via my Virgin connection is currently greeted with an egg timer set for hard boiled ostrich.
It doesn't seem likely, does it? There's one type of encryption (homomorphic encryption) that in theory could work, but in practice it won't. I won't bore you with details of that.
The solution I would use would be to set up the front-end of the storage system to use an all-or-nothing transform (AONT) on the files, break them up into blocks and then distribute those blocks in a random order, with a single encrypted "key" being the locations and order of those blocks. So long as nobody can break into the fronted computer (or instruct it to divulge how to reconstruct a given file) then the storage is secure. Since the AONT should produce the same blocks for the same input file, you can do block-level dedup on the actual storage servers. I'd then encrypt the access key, add some validation info and send it back to the user before deleting it.
Of course, in this scheme, you (as a user) can't trust the server not to keep the access key or to make a copy before it's encrypted, and so on.
None of these solutions allow for the end user to be in control of the encryption key - which is the implication.
For de-dup to work, Mega has to be able to access the content of the file otherwise it cant make it available to two different people who may use different encryption keys.
A little sense here please.
How are mega.co.nz going to de-duplicate encrypted content?
The very act of encrypting the data makes it unique, or should if done correctly. Only if you tried to upload the same content, encrypted with the same key, twice should this kick in. So you upload a folder with a heap of photos in a single operation then it will de-duplicate that data. However if two people upload the same media file (Non-copyright of course) then as far as the service is concerned that will be two unique chunks of binary data, they have to be if encrypted with different keys.
Just break the encrypted files down into small enough chunks and you'll find dupes, decrypting the chunk with user A's key will give you user A's data, decrypt it with user B's key will give you user B's data. At the end of the day a chunk from the middle of a file is pretty much random binary anyway.
Replying to myself is bad form I know, but no edit, just wanted to clarify, I'm not suggesting they could dedupe two copies of Avatar, I'm suggesting they could dedupe some chunks of an encrypted copy of Avatar against some chunks of an encrypted copy of Titanic, actually, there's no reason why the source files couldn't be the same either but it makes for a more confusing premise.
It's fairly obvious that this doesn't work. The block size has to be larger than the amount of data needed to store the pointer-to-duplicate, otherwise deduplication is more expensive in disc space. So it needs to be at least say 128 bits. The likelihood of two random 128-bit blocks matching is astronomically small. Even if you have 2^64 blocks to choose from (250 million TB of data) the chance of a match is still only 1 in 2^64. You'd have to match a 250 mega-TB corpus with another 250 mega-TB corpus to get an expected saving of 16 bytes total.
"Just break the encrypted files down into small enough chunks and you'll find dupes..."
[First note: I know little about how de-dupe works, but can take a reasonable stab at it.] That sounds perfectly reasonable, but to have the chunks at sizes where there are likely to be duplicates (assuming the encrypted files are distributed randomly across the parameter space) the number of files would be a similar order of magnitude as the size of the files, so repeating the file and referencing another file would be operations needing similar amounts of data. You'd shrink somewhat, but not much as far as I can tell.
In the extreme case, you can produce a look-up table of all possible blocks, then just reference them, and of course this saves nothing...
Just break the encrypted files down into small enough chunks and you'll find dupes
If it were that easy, you could just break it down into 1-bit chunks. But that obviously requires a bigger index than the original file collection. (Q.E.D. by Reductio ad Absurdum). Random data (such as the output of a good encryption algorithm) by definition are not compressible.
"A little sense here please. How are mega.co.nz going to de-duplicate encrypted content?"
With a little ingenuity. One way is to encrypt each block of data in the file with its own hash. Then you send the hash of the result to the site to see if the site already has it. Since everyone is encrypting the same way, this works. You then end up with a list of hashes/decryption keys, one for each block of plain text. If the list isn't large, you encrypt that list with your private key and upload that to accompany the encrypted data. If the list is large, you break it up into blocks and perform the same process on that file, and so on.
That's about all I can say. The rest is pretty much useless at the moment.
How they can claim to be 'The security company' is quite beyond me also... they are promoting a browser which development was partly funded by NSA+CIA and is known to have serious privacy concerns for the hard-core security concious among us.
No matter what you think about this man, he does put a spin on internet file storage that some other companies might want to consider. To throw conspiracy into the mix, I'm starting to think that a lot more people want him gone, even if copyright violations will never occur. At some point, big businesses are going to be concerned with this man.
No, I'm not a Kim.com fan or anything, but he is appearing to be more than just a pirate captain. Could we be witnessing a crook turning wholesome white collar?
if it makes you feel any better, My ISP is blocking mega.co.nz. When reached via proxy, I was able to sign up, but email authentications never arrived. What does that tell you? ….. jptech Posted Monday 21st January 2013 00:39 GMT
And your ISP is …, jptech? The more intelligence-led worlds and their dogs would like to know, for that sort of childish nonsense from supposed adults is not acceptable in a free and meritocratic society.
This guy is a crook of the highest order, and in his case leopards really do not change their spots.
I remember him from way back when he was just Kim Schmitz aka Kimble, running riot on UK Quake 2 servers with an aimbot and threatening anyone who dared to confront him with being DDoS'd. He actually did it too, though the word around the campfire back then was that he paid a load of real hackers to do it for him (this was a long time before that whole Osama nonsense he came up with).
In his former life as just a regular crook he defrauded numerous people out of money through various insider trading scams, most notably pump-and-dumping Letsbuyit. Despite all of this he's largely avoided serious incarceration and ultimately amassed a wealth off the back of straight up copyright infringement.
That all said he is a fascinating character simply because his wealth and the ability to amass more of it in spite of what would appear to be a child like persona defies logic.
Mega will I'm sure go down a similar road to Megaupload simply because Kim Schmitz, Kimble or Kim Dotcom or whatever you want to call him has the scruples of a serial killer. He does not respect anything other than his own wealth and ridiculous self-image.
(AC so my internet doesn't get DDoS'd)
This post has been deleted by its author
@AC
As I thought. He did DDoS Barrysworld in a hissy-fit as 155&Rising remembers here:
"And beating l33t professional hax0r www.kimble.org and his aimbot in ra2, only to have him nuke b0rk.co.uk's irc bouncer afterwards and throw dos attacks at Barrysworld for the next week or so in revenge. ph34r the hacker dog ;)"
My old Barrysworld chums will be full of stories of his great prickery.
All cloud services should provide encryption. There's no reason why gmail could not provide an encrypted email option where the message content is decrypted by local browser plugins or javascript or similar. The only reason not to encrypt cloud data is to allow the provider to parse and possibly sell the content. Like if you want to post relevant ads next to your content.
In my mind there is little question that big business is concerned with largely free massive storage sites. Regardless of whether or not the content is legally shared or not, there will be the concern about control of distribution. Currently media is largely controlled by distribution companies. Companies with huge profits that largely steal from their own clients and garantee the quantity of content is kept to a minimum in order to channel sales through a limited set of artists. Distribution companies are about to die, because after all the process of distribution is basically $0. Since we all know that the consumer pays for the majority of bandwidth costs directly to their ISP.
Finally, I just want to mention that in many countries sharing content you own with people you know is not illegal. The american legal system would have you believe that sharing what you own with those you care about is immoral. This ideal is not shared among most of the world. So although you may feel that Dotcom is/was obviously a criminal, many of us are far less convinced.
There's no reason why gmail could not provide an encrypted email option where the message content is decrypted by local browser plugins or javascript or similar.
Unless the email is being encrypted at the senders end, it'd make no difference whatsoever to Gmail's ability to scan and index. They'd just process it when it first hits their SMTP server instead.
If it's being done at the senders end, you can already do that - use PGP - whilst it'd be nice to have it happen 'in browser' there's no reason GMail needs to provide this, pretty sure there are browser plugins that can do that for you.
It's a nice idea though, but I'm not sure I'd trust my email provider to provide the solution, especially if the aim is to keep said provider out of my emails!
I don't see anything worth reporting about the site "not working". Right off the bat they were (supposedly) handling 2K signups per minute, and my guess is many times that in requests. I got timeouts in the first hour, and then I was able to sign up. However, I was not able to upload files until some hours later. How a site works under it's initial barrage of requests isn't really any indication of how it will be running after that.
Same here, I do appreciate that due to a heavy load, first day, it would be bad but lets face it its not exactly Glastonbury tickets he is selling and a user registration/automated email setup shouldn't put too much load on the system..
You could potentially excuse a newbie company for doing this but someone with K.Com's (supposed?) knowledge of the net etc? inexcusable if you ask me..
"You are strictly prohibited from using our services to infringe copyright. You may not upload, download, store, share, display, stream, distribute, e-mail, link to, transmit or otherwise make available any files, data, or content that infringes any copyright or other proprietary rights of any person or entity."
But why else would anyone use MegaUpload if not for storing and sharing bootleg content? Does anyone really think it used to be be popular because of the generic service?
No business with any kind of sense would touch this site, and I don't know why any consumer wouldn't just use SkyDrive, Google Drive, or Dropbox.
Me thinks that the NEW MegaUpload is going to be about as successful as the NEW Napster was.
He said himself that it's basically a competitor to dropbox - it's not a new megaupload. I can't see how this can be touched legally, because you could only share files if you shared the account login.
The terms and conditions are pretty much standard for any service like this (FWIW I'm also pretty sure that megaupload had similar T&C's)
Why use it?..... 50GB free storage, or up to 4TB for 30 euros a month.
Why's it slow right now?.... Massive publicity, 50GB free storage, or up to 4TB for 30 euros a month!
Good luck to him.
Yeah, and certain toys that're long, smooth or dimpled and go "bzzzmmbzzzzmmmm" are "adult novelties, not for internal use". Right, sure, we believe you. I'll just get that "back massager" and go home.
It's a legal fiction that, along with their supposed inability to decrypt the data they're storing, they think might let them get away with mass copyright infringement. Much like selling toys as "adult novelties" lets vendors get around the USA and some other countries' prudish laws (and some liability laws).
Avoiding specific words in case of spam filter / people's stupid work proxies, in case this sounds a little bit roundabout.
Clause 19 makes it plain Mega is a no-dodgy-files zone, stating "You are strictly prohibited from using our services to infringe copyright. You may not upload, download, store, share, display, stream, distribute, e-mail, link to, transmit or otherwise make available any files, data, or content that infringes any copyright or other proprietary rights of any person or entity."
-----
So the same statement every dodgy download site has then?
Very sexy and modern interface. It is indeed kinda slow but the number of people hammering the site must be insane. Went through the process of uploading and downloading. No problems, a lot smoother than the old version. Didn't need to signup and didn't need to fill out any captcha or wait for a countdown to download.
But, but, the password you enter is also your encryption key, and you can't change it - so WTF do you only enter it ONCE when registering? Guess what happens when you make a typo in the signup, or you forgot by the time the email turned up? You're screwed, that's what. Three goes and you're locked out and the email link responds with a naff error code.
I'll bet 99% of those accounts are folks signing up 2 or 3 times because they made a typo the first time round. Thankfully an endless supply of junk email addresses is at hand.
Don't understand the whole password thing. I [ would have] thought that Mega hold the encryption keys, but they are encrypted themselves with your password. When you log in, they send your keys to you which is then decrypted client-side using your password, and can be used for uploading new files. If you wanted to change your password, all that needs to be done is to for the client-side script to re-encrypt your keys and send them back for storage. Encrypting using your password as the key is just plain stupid, as your password will hit Mega's servers when you log in.
At least, that's they way I'd hope it to be. Please feel free to pick holes in this.
In addition, I'd hope that when you login your password never hits Mega's servers (if it were used for encrypting your keys), but is hashed and then sent to authenticate (this hash in turn is hashed and compared to the credentials database). That would mean that could still authenticate you, but would never be even able to sniff what your password is. Of course, it would rely on you completely trusting their client-side scripts, so that will fail the tin foil hat test.
As of 1am Pacific time, the site is allowing me to login and view the cloud drive. I went to setting and switched upload speed to unlimited. I am in the process of uploading a test image.
Previously it was not working for me in any browser. I am using Chrome for this one, simply because that was suggested by the site.
The screenshots he was teasing before launch showed a 2048-bit RSA key being generated using keypresses & mouse movements as entropy. That didn't happen when I signed up ... wonder if this (taken with the dedupe stuff) isn't more smoke and mirrors than encryption and privacy.
I remember Kim from the kimble days too - had no idea he ran Megaupload until it got closed down.
"failed on a variety of platforms (Mac, Windows, iPad, over ADSL in two locations and 3G wireless)"
Are... are you saying that you think making a request to an overloaded server from a different client device / different connection will affect it's ability to respond?
I really need to find another IT news site.
When you see propoganda it is so much sweeter when you can call it out for what it is and know beyond doubt that you are right. For some this is a massive story for some just an opportunity to create a bit of traffic to their sites, as it the case for the author of this piece. I managed to register three separate accounts with MEGA over the time period he is talking about and before then too.
I have managed to upload a few files and although small it worked very well so i call bullsheepers to the author of this piece and i have registered specifically to say that, not that all the comments will be shown but on sites that really matter MEGA is a huge success for all, and Kim is again making waves that need to be made in this closed society of ours where greed is the ultimate factor in any decision by those with just a little perceived power. The internet is the great technological levelling of the field, where anyone can say anything they want and share anything with anyone else. It is time for those who thought they had power to fall by the wayside and their cries and desperate attempts are going to be met with the contempt they deserve.
It's convergent encryption. The upload process works like this:
1. Encrypt the file with the file's hash as the key
2. Upload the ciphertext (which will always be the same for the same plaintext)
3. Client-side encrypt the hash (key) with the user's password
4. Upload the encrypted key for storage
The download process then goes:
1. Receive the ciphertext (encrypted file) and the encrypted key to that
2. Client-side decode the key with your password
3. Client-side decode the ciphertext with the now decoded key to obtain the original file
Because the encrypted file will *always* be the same from the same source file, they can detect duplicates. Since the hash function is one-way, there's no way to decode the original file server-side. Of course, it does mean that they know which users have uploaded a copyrighted file.
This is why the lesson here is: If you don't want Mega to know you've uploaded a copyrighted file, archive it with some random data before you upload, so the hash will be different.