back to article Cloud startup's business model defies laws of physics

Start-up Bitcasa invites you to shove all your data into the cloud and use your hard drive as a cache. It's offering infinite storage capacity, it says, for 10 bucks a month. Really. Bitcasa says it is different from Nirvanix, Mozy and others because it is not a cloud-based backup company. It's different from Dropbox because …

COMMENTS

This topic is closed for new posts.
  1. 2metoo
    WTF?

    Random data

    Deduping supposedly encrypted data? Yet it can spot 20 copies of the same file from different users?

    You want to patent that. No really...

    1. Kirbini
      Go

      Simple really...

      They use the same cereal box decoder ring for each encryption. Because of this they can't "know anything about files and folders" but they can still spot that template you and 10 others copied from the internet for your PowerPoint slide deck. Template spotted; 10 copies not stored.

      See? Easy peasy.

      Too bad for them that Cap't Crunch and I have prior art dating to the early 80's.

      1. Ammaross Danan

        Same "template" base?

        Even if they use the same template, the resulting file will be different, so file-level matching is out based on hashing of the encrypted data. It says if you share even a single slide between ppts, they dedup it, which means perhaps block-level dedup with understanding of filetypes to a degree. likely they "stripe" intelligent chunks (slides, pages, or simply data-blocks) across hdds and match those chunk signatures. They may not know what's in the file server-side, but I'm sure the client-side is quite aware to pull this off.

    2. jai

      but i guess that won't occur to the majority of punters who will be baffled by all the technospeak, but understand that when a phone company says 'unlimited' it doesn't actually mean unlimited, so when someone says 'infinite' obviously they're telling the truth

      1. theblackhand
        Devil

        It's simple really....

        Take 10 punters with DIFFERENT powerpoint presentations. Hopefully the encryption will turn them into the same bit stream that can then be de-duplicated and only one copy stored.

        Either that or everything is sent to /dev/null and is able to be retrieved as locg as it is cached on your hard drive....

    3. Vision Aforethought
      Unhappy

      Hope not

      I came up with the same idea a whilst back for my employer and tweeted about it. (Something along the lines of "One day, there will only be one copy of each item of content.") Why replicate when you can stream?

      1. Lou 2

        Because you do not have free internet?

        Because you do not have free internet? Just an example - streaming full-time is going to cost money - unless you have an all you can eat plan.

  2. Peladon

    Those who do not learn from history...

    Here

    We

    Go -

    Again.

    I'm not even going to list the companies. I'd run out of room. Unlimited bandwidth deals. Unlimited storage deals. Infinite (or @#$#$%$^ Unlimited) any bloody thing.

    I'd rant. I'd rave. But I really can't be bothered. Apart from saying that the appearance of 'infinite', 'unlimited' or 'forever' in advertising for just about anything these days is a quick way to stop me ever becoming a customer.

    Sigh.

  3. Arrrggghh-otron

    Mi Bitcase es su Bitcasa

    That all sounded interesting until the bit about sharing stuff with a URL.

    Here's my data and here is the big red target on the side... have fun!

  4. David Given
    WTF?

    In order to do deduplication they have to know what your data is --- obviously; they need to be able to match the hashes of a block from one user against a block from another user. And, naturally, since they're only storing one instance of the block, that block must be accessible by both users. The only way I can conceive of this working in an encrypted environment is if all users have the same encryption key... which rather defeats the purpose of encryption.

    Or am I missing something?

    1. Anonymous Coward
      Anonymous Coward

      Having read the "learn more" link on their site, they appear to suggest that the data is encrypted client side and stored unchanged, i.e. encrypted, on their storage.

      If users do not have the same encryption key then I'd have though the liklihood of finding any duplicated data that users could share would be minimal. If they do have the same key, then identical files encrypted by 2 users could result in the same encrypted data block. As they have no access to the unencrypted data, they can't know if user1's powerpoint slide is the same as user2's unless they use the same key.

      I suppose because they are looking below file level then blocks of data could be duplicated across users, and the smaller the block size the liklier that would be, peaking at about 50% when the blocksize is 1bit (assuming a random average)

      Perhaps I am missing something too. I share Mr Given's Misgivings.

      1. Tomato42
        Boffin

        @AC 13:38 GMT

        Any proper encryption will give you different output every time you encrypt the file, even if you use the same key, the Initialization Vector (IV) should be chosen at random.

        Not that It looks any different than simple TLS after reading the features...

  5. Anonymous Coward
    Anonymous Coward

    Limited infinity

    I just tried to register for their beta and got the message that space is limited. So they must be using some new meaning for the word infinite.

  6. Anonymous Coward
    Anonymous Coward

    Impossible, false, or both

    The link has many comments that explain it fairly well.

    Let's see, if you have encrypted my data it is because does not bear any reasonable resemblance to the original. Hence, you cannot compare it with someone else's files and determine if I have an identical copy.

    Unless of course, you store a signature (hash) of each of my files. That avoids you having to look at the file contents, you can just compare the hashes and determine if two files are identical.

    But if you do that you have not encrypted my data, at least given any file I can tell if you have it or not.

    But it's even worse than that and I don't know why anyone has not highlighted it. For encryption to be secure, it has to be based on something shared between the cloud and me, plus something else I only know and nobody else. Hence, if encryption keys are shared among clients, you're actually encoding using the same key to everyone.

    Which is enormously risky, because if the private key is revealed, all the encrypted data, not only the one belonging to the user but everyone's can be drecrypted. Ask the HDMI and Sony guys about that.

    This is such a scam that I don't know why even The Reg is giving it visibility. Oh, well, to allow for some anonymous coward to post.

    1. Spearchucker Jones
      Boffin

      Quite possible, actually.

      All the posts I've read so far seem to talk about asymmetric encryption, or symmetric encryption. It is entirely possible to do what Bitcasa are claiming. Although I wonder if they got it right. Anyways, the classic Needham-Shcroeder protocol (assuming you replace nonces with timestamps) provides a good basis for it.

      It works like this:

      Alice is a subject that submits a file

      Bob is a subject that has shared access to Alice's file

      Sam is the Bitcasa server

      Alice calls Sam and says she'd like to share a file with Bob.

      Sam makes up a session key message consisting of Alice's name, Bob's name, a key for them to use, and a timestamp.

      Sam encrypts all this under the key he shares with Alice, and he encrypts another copy of it under the key he shares with Bob.

      He gives both ciphertexts to Alice.

      Alice retrieves the key from the ciphertext that was encrypted for her, and passes on to Bob the ciphertext that was encrypted for him.

      Next, Alice creates a hash of the unencrypted file, and sends that to Sam for indexing.

      Alice now uploads her file to Sam, encrypted using the key from the ciphertext that was encrypted for her.

      Bob has access anytime he likes, using the key from the ciphertext that was encrypted for him.

      Simples. If, as I said, they got a). the protocol right and b). the implementation actually reflects the protocol.

      1. Tomato42
        WTF?

        @Spearchucker Jones

        It still doesn't explain how they can deduplicate encrypted files *on a block level*

        1. Spearchucker Jones

          Easy enough...

          ...when your hashes are computed from the unencrypted source. You could split the file into block-sized chuncks and hash those. Or you could treat file contents (e.g. individual PowerPoint slides) separately. If fact you could templatize file chunking based on a new policy downloaded from the server for each and every session. If you wanted.

          How the hash values (if that's what they're using) are computed will determine the granularity of deduplication. From there the problem is one of indexing and content management.

          The real security issue such a system faces is key management. There will be a public/private keypair (async) for every user, and another for every user's device. There will be a syncronous key for every file. That's a lot of key management.

          I guess the final point is that using a well-thunk-through combination of async, sync and one-way encryption, it's entirely possible to compare segments of files you don't know the contents of.

      2. Anonymous Coward
        Anonymous Coward

        Invalid example

        The point was about how encryption and deduplication are basically incompatible. Your example only addresses the sharing scenario, of course assuming that one knows before uploading who are you going to share things with. But does not address the situation where one wants to share the file after it is uploaded.

        Neither addresses how the system is going to know which two files from different users are identical. Hashing prior to encryption is at best a leak, because the system knows who else has something that hashes to the same value and thus has a degree of knowledge of the content (i.e, the RIAA/MPAA can ask them who has a file by providing a hash) and at worst a terrible security nightmare since the hash is calculated by an untrusted party -the client PC-

        The possibilities for breaking havoc are endless. So yeah, unlimited storage (patently false) without knowing what we are storing (false) Pretty much invalidates the whole product.

        1. Spearchucker Jones

          Erm...

          Dude, without prejudice, I iamgine you're probably not familiar with assymetric and symmetric encryption. If you're interested, check out how PGP managed session keys. Similar concept, different application.

          If I upload a file it is encrypted using a symmetric key I own. If I then share that file at a later stage all I have to do is to share the symmetric key using my friend's public key. This would be done at the point where I instruct Bitcasa to "Share file.ptt with UserX".

          The system knows two files (or file parts) are identical because their hash values are identical. Yes. This means the hashing password must come from the server.

          I did not specifiy where keys are generated - I don't know that. Either client or server are a good choice, depending on your objective.

          There are some very sensitive documents that use a similar protocol to make encrypted content searchable. Your risk analysis will highlight the impact and probability of any weakness. It is then a business decision to mitigate (manage), transfer or accept those risks.

          Many problems I've worked on choose to both mitigate and accept - i.e. in the search implementation I did, the search index was also encrypted. AES was fast enough for that. The threat model showed that accepting the remainder of the risk (shared symmetric key for the index) had business legs.

          YMMV.

          Technology is easy. People and process are not.

          1. Anonymous Coward
            Anonymous Coward

            No prejudice

            I'm not an expert in encryption, but still not convinced that their claims are false. You seem well versed so maybe it's a good time to learn something. Let's see.

            Sharing after uploading: so each shared file with each individual has an associated key? Looks like the right way to do it from a security point of view. However, that's an awful lot of keys to handle, does not look very scalable to me. Way less than "unlimited", which simple laws of physics says it was false from the start.

            Hashing to de-duplicate: if the hashing password comes from the server, I have to upload the file unencrypted, right? Hence, the service knows the unencrypted contents of my file. Fails on "we don't know what you're storing" part.

            To avoid that, the client can encrypt before uploading and then upload the hashes as part of say, metadata. Then the system will know the hash of the raw content only because it trusts the client, so I can make up whatever hash I want and check if the server has it. Great for content providers, I guess, but fails again in the "we don't know what you're storing".

  7. Graham 32

    I'm deeply suspicious of this too. The encryption just cannot be for everything.

    The techcrunch article even says it "doesn’t know anything about the file itself, really. It doesn’t see the file’s title or know its contents." And you "can share a link (file, folder) with other users". Really? Share a file it "really" doesn't know about? Oh rly! Sounds like marketing bull to me.

    1. Anonymous Coward
      Anonymous Coward

      Good point

      If it is all encrypted client side before upload and the hosts "can't access or see" the encrypted data, then passing a link to it when it is sitting "in the cloud" gets decrypted how? Unless, the other user also has to have Bitcasa client and all clients have to use the same key to encrypt/decrypt...

      Or it is indeed a bucket of still steaming, finest marketing.

      1. User McUser
        Go

        @Mr. Biscuit Barrel

        They just use a really fast and effective encryption. I believe it's called "ROT13".

        It's super effective.

  8. Anonymous Coward
    Anonymous Coward

    FFS

    "Thank you for signing up for the Bitcasa beta. Space is extremely limited and you are at the back of the queue.

    To move yourself up in line, send a tweet or post to Facebook including your personal sharing link below. The more people you get to sign up, the sooner you get Bitcasa."

    so they want you to spam FB and twitter and get people after you to sign up, how the fuck does that move you up a FIFO Q?

    1. Anonymous Coward
      Anonymous Coward

      pyramid selling?

      Well if you were say, in the bottom q, and then you persuaded lots of people to sign up they would be even further down the same q, so relatively speaking you would have move up a notch in a much bigger q.

  9. launcap Silver badge
    FAIL

    And when you try to sign up..

    They'll basically tell you to spam all your friends[1] in order to increase your chances of actually getting on the beta.

    Fail.

    [1] 'Tell all your Facebook friends and paste the following URL into your Twitter feed' - and what about people that don't have farcebook and twitter?

    1. Kirbini
      Black Helicopters

      Obvious, really.

      This is exactly *why* I do not have Facebook or Twitter...

  10. TimBiller
    WTF?

    Encryped data is random.

    I'd love to know how they they are deduplicating encrypted files?

    I can understand transmitting them as random rubbish but it's not possible to dedup data in this form.

    Tim

    1. Dan 10

      Neither encryption or dedupe are my strong points, but why couldn't you encrypt at file level and dedupe at block level? If a block of encrypted data looks like "01010111", for example, and you dedupe any blocks with that same sequence, surely that is completely abstracted (and therefore irrelevant) to the encryption keys/vectors etc that are in use?

  11. umacf24
    Black Helicopters

    Put your complete hard drive on line

    Crunchfund may be an investor, but this feels like the NSA will be the infrastructure provider! *Adjusts aluminium foil skullcap*

  12. The BigYin

    Err...

    Surely to do the de-dupe the data must be unencrypted at their end (or at least accessible, a la Dropbox)?

    Sure, putting *ALL* my data on-line where any third party could look at it is smart.

    No thanks, I'll pass.

  13. Ross 7

    Infinite?

    Infinite -> unlimited -> broadband. See where I'm going here?...

  14. mraak
    Thumb Down

    Lame

    I signed up for beta and got the email

    "Space is extremely limited and you are at the back of the queue. The more people you get to sign up, the sooner you get Bitcasa. Use the link below to share with friends or post to your social networks."

    I think someone attended too many web 2.0 marketing seminars.

  15. Paul C
    Thumb Down

    The dedup and encryption

    Can't work across users unless it's a common keypair - if it can, they can retrieve the data, can't they?

    I'm glad to see they moved onto another slam-dunk product after the raging success that was the CrunchTablet.

    1. T.a.f.T.

      See below

      It can work just fine if you deduplicate stripes of drive data rather than files; there are only so many combinations of #kB of data so I would have thought that with enough drive space even random data (which is what encrypted data should look like) will start to have a few patters.

      1. Charles 9

        Geometric complexity.

        The problem is that the number of possible permutations balloons with just one additional byte. Each one multiplies the total possible combinations by 2^8, or 256. Put in perspective, to actually store the two-byte words of every single 16-bit possibility from 0 to 65535 would require 2 x 65536B, or 128KiB (this from just 16 bits--double it and you leapfrog Mebibyte into Gibibyte territory).

  16. Pete 2 Silver badge

    Qute easy really

    Just keep sending the data round and round "the internet". Using the cache on all the servers and routers it passes through to store it. Then when you want to retreive it, just wait until it comes round through your severs on it's next "orbit"

    It's a mashup of mercury delay line memory, logistics companies using their lorries as warehouses (while they're on the road, delivering your stuff) and the standard internet/cloud marketing BS.

    1. Bronek Kozicki

      seen this before

      here: http://lcamtuf.coredump.cx/juggling_with_packets.txt

      although such capacity is rather small.

  17. Anonymous Coward
    Anonymous Coward

    Oh really?

    "Start-up Bitcasa invites you to shove all your data into the cloud"

    And I invite Bitcasa to shove their idea into another place where the sun doesn't shine.

  18. Jase 1

    Maybe they are doing dedupe on the client, then encrypting the deduped blocks and only sending the unique blocks?

  19. T.a.f.T.
    Facepalm

    Drive level

    If they dedup at the hard drive level then there are only so many sequences of bytes you can have. If they do not know what is file and what is directory then your client must hold all of that infromation & they would not be able to use hash keys to identify files.

    With several Petabytes of data even "random" content such as encryption is going to have a few matching patterns; true they cannot identify that you and your Facebook friend have the same slide in a power point slide but they don't need to. I don't see why every client needs to share any keys, I think the power point analogy in the article has sent people down the wrong mental street.

    1. Tomato42
      Boffin

      read more about Pigeonhole principle, in short even with 512bit hashes you can have collisions that don't translate to two identical blocks of data.

      And with petabytes (if not exabytes) of data hash collisions go down from "improbable" to "likely".

  20. Anonymous Coward
    Anonymous Coward

    Patents ?

    Anyone find the patents they have filed, I'm missing all 20. Are my search skills that bad or !!!!

  21. Anonymous Coward
    Boffin

    Size is everything...

    ...on the subject of infinite space availability... They don't have to actuall PROVIDE infinite capacity, since they'll have finite customers. They just need to have enough capacity per-customer to meet the average customer - who probably needs very little storage. So they have to add the drive space (and infrastructure) for, say, 50gb for every new customer - or whatever it averages out to; your usual person hasn't got 100gb of MKVs - and really that's pretty cheap. You amortize it and it comes out OK, I bet.

    You can make educated guesses about the transfer costs, too; they already say that high-volume stuff is 'cached' on the local drive. So you store DSCN4012.JPG for six months and they download it to show grandma. You've used 4mb of bandwidth for that chunk for a year.

    As long as you can scale your storage to match each customer, and your averages are correct (and presumably they'll get better the more customers you have) there isn't any reason you shouldn't have effectively infinite storage. The service won't be practical for storing big-ass media libraries unless you have unlimited, *fast* network; it won't be practical for things like movie or audio editing or game development; it won't be practical for loading Crysis Tournament and Conquer of War: Lost Coast.

    So what's going to go on there? Pictures of grandma and the cat, email folders, Word docs, and the odd music collection - tiny, in the scheme of things.

    I don't see why the storage aspect won't work. Encryption, of course, is another matter.

  22. J 3
    Joke

    Random data, really!?

    Someone is selling quantum desktops and no one told me? Mean...

  23. Anonymous Coward
    Anonymous Coward

    Back of queue...

    Yep, I signed up and got the same "back of queue" message.

    It looks as if they failed to plan for demand, so goodness knows will happen when people start uploading files. Encrypted files that won't de-dupe...

  24. Anonymous Coward
    Anonymous Coward

    Infinite storage ...

    Handy, because I really need to store an 11-dimensional model of the quantum state of every particle in the universe right now.

    1. Anonymous Coward
      Anonymous Coward

      only 11!

      You know soon as you put a limit on anything in the universe it seems to somehow spell out a missing level you need to add.

  25. Jeff 11

    The cloud storage model

    1) Offer infinite storage.

    2) Renege on infinite storage.

    3) Hope that customer has enough crap stored on your servers that it's infeasible for them to jump to storage 2.0 company n+1 currently at phase 1.

  26. a pressbutton

    all possible and might work

    Infinite storage is impossible

    However there will be limited bandwidth, and limited time.

    Like an unlimited mobile phone contract, you cannot talk for more than (60*24*31=) 44640 mins a month, and most people (they hope) talk for about 150-600.

    Unlike voice, you can throttle bandwidth for heavy users (as isps do) - or ask heavy users to pay more for more bandwidth - but the storage is still free :)

    Deduplication I leave to others, but would wonder how many of us encrypt our MP3/4 collections.

    Of ~60GB of files, I have 30GB mp3, 20GB mp4/avi, 8GB photos and 2gb other

    So, 1/6th cannot be deduped.

    Getting clever, you could look at my music collection and make good geusses on things I would like if there was enough of a user base. Privacy is a subject for historians to invade

  27. b166er

    Dedupe methodology

    1) checksum local file

    2) if checksum does not exist in online storage area, transfer checksum and encrypted file as a pair to online storage area.

    3) if checksum does exist in online storage area, store a pointer

    4) profit!

    Only problem being, querying the checksum catalogue is gonna get slower the more data gets stored. Presumably all clients download updates to the checksum catalogue for faster local querying.

    Will be interesting to see if they've done their math's correctly, but when I see broadband being provisioned for £3.25pcm and the price of S3, it's not out of the question.

    1. Anonymous Coward
      Anonymous Coward

      5) User later discovers that although the checksum matched, there was in fact a hash collision, and when retrieving their files, gets someone elses document. Probability goes up the more files you store.

      6) You are then sued by either customer.

      7) Loss

      In dedupe, hashes are just used as a indicator for which files/blocks to do a bit compare on , if you don't want to lose data anyway,

      I guess a case could be made for a checksum, the size of which was as large or larger than the file itself, which could mathematically guarantee no collisions while not being subject to decryption to reveal the files contents. Not sure such an algorithm exists.

  28. Anonymous Coward
    Anonymous Coward

    Oh yeah? I'll go one better

    What kind of infinity are they allowing? I'll bet it's only aleph-0.

    Well, I'll go one better than that! I'll give you aleph-1 of storage!

    Not just enough storage to store ALL the counting number, but enough to store ALL the REAL numbers, including the irrational ones! Act now, and I'll bump it to allow you to store all the COMPLEX numbers!

    That's right! Not just countably infinite storage, but UNCOUNTABLY INFINITE Storage!

  29. Steve May 1
    Happy

    Cashing in

    I can now make a fortune re-selling this service to CERN so they can store all those petabytes of Higgs boson related data.

    IMNSHO the word "infinite" will now go the same way as "unlimited", meaning whatever the seller pleases. Use of either in advertising should be automatic grounds for withdrawal of the ad, much as any reference to perpetual motion invalidates a patent application.

    A useful rule of thumb.. "If it souunds too good to be true, it probably IS too good to be true".

    This reminds me dimly of a wonder storage product some years ago, which perported to use some kind of holographic storage method to provide vasty levels of data storage in a shoebox sized device.

    If this turns out to perform as advertised I will eat any kind of headcovering your care to mention.

  30. This post has been deleted by its author

  31. Dave Culley
    Meh

    Not the first to offer this

    No one has mentioned Livedrive who have been offering unlimited storage for some time now. They use similar (disclaimer, it seems similar to me..) hashing techniques to avoid file duplication and as other posters have said, I think simply rely on the fact that the vast majority of people signing up will probably use less than 50GB. And if you've got security concerns then no service which can be accessed with a username and password via the web is ever going to be safe enough for you, in my opinion.

  32. bzob
    Go

    Understanding how Bitcasa can do what they say.

    Lots of discussion has been around how they'd store uniq files. I think a missing piece is that they can and probably do dedupe at the block or sector level and not only and the whole file level. I wrote a post on how I think they do it. "Understand how Bitcasa can do what they say." http://t.co/nzoSOFy.

  33. Microchip
    Alert

    RIAA/MPAA/etc?

    So, these duplicate file hashes... will the MPAA/RIAA be able to get hold of said hashes / create them, then request Bitcasa to hand over everyone who has an MP3/video matching that hash? Possible instant mass lawsuit ensues?

    Other than that, it sounds a great plan, but I suspect a lot of freetards would get potentially nailed.

This topic is closed for new posts.

Other stories you might like