back to article Machine needs more Learning: Google Drive dings single-character files for copyright infringement

Google last month announced plans to prevent customer files stored in Google Drive from being shared when the web giant's automated scanning system finds files that violate its abuse prevention rules. "When [a file is] restricted, you may see a flag next to the filename, you won't be able to share it, and your file will no …

  1. Bitsminer Silver badge

    Well, that's one for the history books

    Too bad, Google.

    1. Paul Herber Silver badge

      Re: Well, that's one for the history books

      That's one for the money, ...

      1. Steve Foster
        Joke

        Re: Well, that's one for the history books

        ... 10 for the show...

        (binary joke)

        1. TRT Silver badge

          Re: Snow White and the 111 Nybbles.

          1 0 1 0, it's off to work I go.

        2. Zolko Silver badge

          Re: binary joke

          there are only 10 types of people in the world: those who understand binary, and those who don't.

    2. Blackjack Silver badge

      Re: Well, that's one for the history books

      Is this better or worse that those times Google just decided to delete Google Drive files with no warning whatsoever?

      1. Anonymous Coward
        Anonymous Coward

        Re: Well, that's one for the history books

        In both cases the fault is with the users who somehow expected things will necessarily go their way. Google is like that monster wave, you might ride it for a while, but it's at your own risk, don't go crying if it suddenly breaks every bone in your body.

  2. b0llchit Silver badge
    Alien

    I did register the letters G, O, L and E a very long time ago. I remember I had to fight the registration bureau quite a bit because they suggested prior art in literature. But, I've been able to convince them of a time paradox where my thoughts, ideas, designs and all I produced have been scattered over time and have presented definite proof of ownership.

    Now I own the alphabet in all its glory. Therefore, you are all infringing! Stop writing you repeat infringers! You owe me lots and lots of money. Are you reading this? Hello? Hello?

    1. John Brown (no body) Silver badge

      "Therefore, you are all infringing! Stop writing you repeat infringers!"

      OtmMGtm! What a good thing we can all learn to write emojii :-)

      1. b0llchit Silver badge

        O©MG©, yo©u me©an.

        ©!

    2. Anonymous Coward
      Anonymous Coward

      Nice try, but I went even further back in time and registered the representation of the electronic states "1" and "0" that modern computers use to represent the letters that you later (and illegally) registered. Sorry, but you owe me at least $3.50 for your infringement.

      1. Jellied Eel Silver badge

        Pfft. I went back even further and registered elctrons, photons and time travel. Sadly, this also resulted in discovering the rather mundane origin of the Big Bang. In hindsight, it should have been obvious what would happen when hordes of lawyers and their time machines all converged on a single point in space and time to file the first claim.

        1. MisterHappy

          But I went even further back in time and bought the architect a very nice dinner & suggested he put a lever right... here.

          https://www.youtube.com/watch?v=Do-wDPoC6GM&ab_channel=ComicRelief%3ARedNoseDay

          1. Zolko Silver badge

            I went back in time and patented time-travel itself, so no luck for you

            1. The Indomitable Gall

              A good time travel should understand the problems of patenting time-travel, given the 20 year term limit on patents.

              Patent a time-machine today, and I'll just pop out the back to get in one that I'll build 20 years from now in order to send back to my back garden, arriving today.

              1. Jellied Eel Silver badge

                Not sure why you think it's 20yrs, when wiki clearly states that patents have always been granted in perpetuity.

                1. doublelayer Silver badge

                  Patents aren't perpetual. You apply for a patent, and you can use it to confirm your ownership until that provision expires, which is at least twenty years though may be longer in some countries. Then it expires. You still have the patent for the record that you invented something, but you can't use it to prevent others from developing your thing anymore. The protection rights entailed in a patent are time limited.

            2. Felonmarmer

              Sorry but I've already patented the business practice of using time travel to set patent start dates. It only applies in this universe though as defined by -infinity < (x,y,z,t) < +infinity so feel free to use higher dimensions or pop over to another universe.

        2. Anonymous Coward
          Anonymous Coward

          God IV, is that you? Dad's asking where you left his keys, looked quite upset to me...

          1. Jellied Eel Silver badge

            <delphic>The keys are in the sock.</delphic>

            1. David 132 Silver badge

              Please don't bring Oracle into this, it's unpleasant enough already.

              1. Anonymous Coward
                Anonymous Coward

                Nah, not Oracle. Turbo Pascal maybe.

      2. Michael Wojcik Silver badge

        Mani had prior art circa 250 AD. Your "electronic states" bit is just an implementation detail.

        Now, if you'd said "electronic states with rounded corners"...

    3. Paul Herber Silver badge

      So that's what own GOLE means!

  3. DJV Silver badge

    The one piece of text in that Google notice that really pisses me off and demonstrates Google's arrogance is: "A review cannot be requested for this restriction" which roughly translates as "We can do what we want and there's nothing you can do to question it, suckers!"

    I used to merely hate Google, now I totally loathe them.

    1. skeptical i
      Devil

      "Google's ... message ... includes a button labeled "Request a Review""

      BWAH-HAH-HAH-HAH-HAH! Next to the button labeled "Pull my Finger", right?

    2. Anonymous Coward
      Anonymous Coward

      Yeah, even MS isn't (usually) that stuck up their own ass. That displays an almost breathtaking amount of conceit by Google.

      1. Mark 85
        Black Helicopters

        Anonymous Coward

        Yeah, even MS isn't (usually) that stuck up their own ass. That displays an almost breathtaking amount of conceit by Google.

        Google conceited? Just because they think they own the world? Hmm... I'm hearing a knock at the door and someone yelling "Open up! Google Police!!!!"

    3. simonb_london

      Not a big fan of more legislation myself but this issue of companies clobbering users with zero explanation, chance to appeal, human being to talk about it with is getting out of hand. That coupled with deflection-mode call centres and websites that just send frustrated customers round in circles to the same inadequate FAQs. It looks like companies now need to be legally bound to be accountable as to how they treat their users.

    4. Zolko Silver badge

      I used to merely hate Google, now I totally loathe them

      you should try to ignore them

  4. The Sprocket

    Not likely

    Given the absurdity of this, I'd like to copyright the '@' symbol. (Might get away with it in the nation of Nauru.)

    1. KittenHuffer Silver badge

      Re: Not likely

      I've already copyrighted the © symbol!

      I've also copyrighted the word recursion, as i see that as my next big cash cow!

  5. mark l 2 Silver badge

    I wasn't even aware that Google Drive files were checked for copyright infringement? Ive uploaded backups of my music ripped from CDs and some films ripped off DVD to it and never experienced any copyright warning. Or is it only if you are sharing the files publicly do they get checked?

    1. Mark 85

      Sounds like it's text files only at this point. I wonder what would happen with Roman Numerals?

    2. Anonymous Coward
      Anonymous Coward

      It's a revenue oppourtunity for Google so they can sell the alleged infringers details to the alleged Lawyers of the alleged copyright owners.

  6. zuckzuckgo Silver badge

    yes no no no yes no yes!

    Since 1 is now off limits I propose we switch from 1 / 0 binary notation to simply yes / no.

    A decimal number like 69 in the new notation would be:

    yes no no no yes no yes!!!

    The triple exclamation marks serve to clearly identify the new notation. This could be the dawn of a new more exciting era in computing!!!

    1. KarMann Silver badge
      Paris Hilton

      Re: yes no no no yes no yes!

      I'm afraid I've found some prior art going back about two millennia. (NSFW)

      1. John McCallum
        Thumb Down

        Re: yes no no no yes no yes!

        Google wants me to prove I am old enough to see that sub 1 minute video

    2. b0llchit Silver badge
      Big Brother

      Re: yes no no no yes no yes!

      But denial (no) and acceptance (yes) have been appropriated by other forces. You are no longer allowed these traditional methods because they are confusing in the online markets. Denial to accept consumer produce is no longer allowed. Acceptance of TOS and EULA is mandatory to disallow the denial.

      Yes?

      No?

      consume stay asleep buy no imagination no thought conform obey

    3. John Brown (no body) Silver badge

      Re: yes no no no yes no yes!

      Jim Trott might have prior art on that :-)

  7. Anonymous Coward
    Anonymous Coward

    Why would anyone in their right mind put copyright material on network drives? Why not just stream from somewhere if you're going to go through network IO to watch or listen to it?

    Sharing said content would be another matter entirely regardless, but I honestly don't understand the point. Surely a USB drive of some sort would be more suitable for backups.

    1. zuckzuckgo Silver badge

      I think the point is that it was obviously not pirated material, highlighting the fact that any automated means to detect pirated material is going to be error prone.

      Does this mean they may also reject academic documents that quote published material or other similar fair use situations?

      1. TRT Silver badge

        They just need to do a bit of fine tuning... single character recognition for the detection of pirate material just needs to detect the one value. R.

      2. ThatOne Silver badge
        Devil

        > may also reject academic documents that quote published material

        We're far from there yet, they apparently just check for the presence of certain letters or numbers...

        What this has to do with copyright is anyone's guess.

        IMHO it's just a case of the old "it's not important to do something, but to be seen doing it": They now can claim to the big copyright owners that they are cracking down on piracy, and as long as they don't need to specify if they are doing it successfully...

    2. Mishak Silver badge

      Surely a USB drive ... for backups

      I guess it depends if "off site" is important to you.

      1. ThatOne Silver badge

        Re: Surely a USB drive ... for backups

        > I guess it depends if "off site" is important to you.

        I don't really see why "USB drive" and "off site" must be mutually exclusive.

        I have disseminated USB drives (protected with strong encryption, not the toy stuff some come equipped with) at several places. Nothing prevents you from having a couple 256 GB USB drives in a drawer at work, and another set at a relative's home. (Note this is for my private backups, my own private stuff. Work backups are an entirely different thing and are handled entirely differently.)

        For private stuff, those 3 sets of USB drives are perfect. Too close to each other? Well, if the whole area gets so utterly obliterated that all your sets of USB drives are trashed, chances are you won't need your backups anymore anyway. And yes, updating is a little more involved, but if you are a little organized it's no big deal to think every now and then to take the backup drives back home to update. It's not like your important private files, those you desperately need to preserve if your house burns down, change daily.

    3. Anonymous Coward
      Anonymous Coward

      Practically all original content -- as distinct from *facts* -- that is not in the public domain has a copyright owned by someone. The bad poem you wrote at university? Copyright owned by you. The email you wrote your colleague last week about the TPS reports? Copyright owned by your employer. Those home videos your parents made when you were a baby? Yep, copyright is owned by your parents. I own the copyright to this comment; El Reg has a license I granted as a condition of being allowed to post it here. Your copy of Linux contains files with hundreds of copyright holders. And so on.

      It's common to express ideas vaguely (Google's communication is an example of making this worse), and when most people say "copyrighted content" what they really mean is content to which the speaker doesn't have a license, or has a license that doesn't allow arbitrary distribution. But that's wildly different from what "copyright [sic] material" really means, and in fact nearly everything everyone stores anywhere is subject to copyright. It's true that with the kind of extremely low-entropy content described by this article (as well as performances of the infamous 4'33" etc) things get dicey, but that isn't how I interpreted your question. It's perfectly fine to store works subject to copyright on these services; if it weren't they wouldn't be useful.

      The problem here is missing metadata, specifically (a) whether the piece of data is subject to copyright, (b) the identity of the copyright owner if so, and (c) the terms of any license to it held by the account owner who's storing it. From that it would be possible to determine conclusively whether or not a work's presence at a particular storage location is infringing. Without it, their actions represent at best an unreliable guess and at worst opaque, asymmetrical, and abusive pandering to giant corporations at everyone else's expense. Wikipedia have done this pretty well: they associate this kind of metadata with the objects they store, which makes it easy to detect problems or document the basis for a work's use. Google, of course, provide their customers with no way to record that metadata, which is why anyone serious about document control uses something else.

    4. katrinab Silver badge
      Megaphone

      Everything on my NextCloud is copyright material.

      Mostly, it is stuff I wrote, so I own the copyright. Some of it is things that other people sent me, and they own the copyright.

      I do have some copyright-expired books, but that is a tiny fraction of my storage volume.

  8. DS999 Silver badge

    I knew my copyright filing

    On the digit "1" would pay off eventually!

    Where is the "giant bags 'o cash" icon?

    1. zuckzuckgo Silver badge
      Thumb Up

      Re: I knew my copyright filing

      In place of the "1" we will just switch to using emoji like this -->

      Although in Googles case I would prefer a raised middle finger.

      1. DS999 Silver badge

        Re: I knew my copyright filing

        Damn, you foiled my plan! Time to cancel that private jet order I guess..

  9. Spiz

    *chuckles in self-hosted NextCloud*

  10. Terry 6 Silver badge

    Corporate wilful ignorance

    but that's exactly why there needs to be a mechanism for communicating those bugs back to the developers.

    This is an issue that goes far beyond Google or this story. It is a general issue in all large companies ( and a good few medium ones, I think)

    Beancounter lead companies do not want to hear from the public that things are going wrong, that there is a fault etc. To do that would cost money providing customer service and hit short term profits, bonuses etc. So they hide or remove contact details, create websites that take customers round in circles from "contact us" to FAQs to "need more help" to "contact us" to..... Or direct users to "Support forums" where people who have nothing to do with the official company and don't get paid can offer amateur advice that may or may not be helpful, but won't make the company aware of or resolve any genuine product issues.

    That they piss off customers and lose them, is a matter for the next financial reporting period.

  11. Natalie Gritpants Jr

    Why not just zip those files up and password protect them? Screw Google's algorithm.

    1. doublelayer Silver badge

      Because, given the quality of this, every encrypted zip file (which has a distinct signature) will become copyright-infringing overnight. If they're using machine learning rather than a big list of hashes, this wouldn't even be surprising as it's exactly the correlation such models tend to identify.

      1. Steve Davies 3 Silver badge

        Don't zip it,

        Encrypt it. Generate your own public/private key pair. Encrypt it with the private key and distribute the public one to the students by email or some non google method.

        It would have the extra benefit of getting the students aware of data encryption.

        Let's go Google!

        1. Cederic Silver badge

          Re: Don't zip it,

          I've been working in IT for decades and have hand crafted HTTPS clients.

          What you've proposed still fills me with fear and horror, and that's before I have to try and explain it to students.

          Frankly it's easier to build my own cloud service with document creation and sharing capabilities.

          1. Not Yb Bronze badge

            Re: Don't zip it,

            "Go to my website and download these files"

            Or, as someone once mentioned

            "Never underestimate the bandwidth of a station wagon full of disk drives"

      2. Norman Nescio Silver badge

        zip archives not entirely secure - lack integrity

        Sigh.

        1) Don't use zip. Files can be replaced in encrypted zip archives with non-encrypted files of the same name, and zip will not complain on unpacking. You need an archiver that doesn't just put a collection of encrypted files in a non-encrypted wrapper that includes metadata such as filenames. 7zip works (with the correct settings).

        https://security.stackexchange.com/questions/35818/are-password-protected-zip-files-secure

        Naïve zip implementations can have other interesting behaviour:

        https://nakedsecurity.sophos.com/2018/06/06/the-zip-slip-vulnerability-what-you-need-to-know/

        2) Assuming you are using something other than zip, then adding a salt file with a few 10s of random bytes to the archive will change the signature of the archive so it doesn't match with other encrypted archives.

        3) If you really, really want to use zip, at least double zip encrypt - create the encrypted archive, then put the encrypted archive you have just created as a single file in its own encrypted archive.

  12. Anonymous Coward
    Anonymous Coward

    If there AI system is such hot spit, how come YouTube is 85% pirated material?

    1. Dan 55 Silver badge

      Because it's fine when YT can monetise it.

      1. ThatOne Silver badge
        Devil

        Yes, and to be more specific, there are two (2) distinct types of copyright infringement:

        1. The one which creates profits = Good

        2. The one which creates lawsuits and makes us lose money = Bad

        Obviously you only crack down on the second one. You wouldn't kill the goose that lays the golden eggs, would you.

        1. TRT Silver badge

          I'm afraid that the phrase you used there is copyright of the Aesop estate and therefore you must pay a licensing fee or cease and desist.

          1. Anonymous Coward
            Anonymous Coward

            Aesop estate

            Aesop was working about 2500 years ago. I reckon his estate could make more money selling his longevity treatments than they every would from copyright infringements, if the copyright still held. Is it still life plus 70 years?

  13. Henry Wertz 1 Gold badge

    And it's not even effective

    And as far as I can tell it's not even effective. I've seen^H^H^H^H heard about (vague rumors, obviously *I* would *never* go to primewire and the like.. yeah..) plenty of google drive links out and about, presumably fully functional.

  14. doublelayer Silver badge

    Separate issue

    Google's system is clearly unfit and a problem, but there's another one. Why can't a university provide sufficient storage for class materials, thus making professors resort to Google Drive? Google Drive is a terrible distribution system as it requires the user to click through to download files, either using an unnecessary web rendering page or presenting a page saying that the web rendering page won't work for this file. It's functional for people who don't want to pay for the bandwidth usage, but a university already has servers that can store some small text files.

    1. 42656e4d203239 Silver badge

      Re: Separate issue

      >>ut a university already has servers that can store some small text files.

      Indeed but a university may not want the security risks involved in allowing random students access to their network from random devices in random locations (aka setting themselves up as a private cloud provider)

      Yes there are ways around the problems associated with running a private cloud but all are more expensive than letting staff use Google Drive (or OneDrive) for sharing files.

      If I were a bean counter at the university I know which option I would pick and, perhaps sadly, bean counters tend to get the final say.

      1. Cederic Silver badge

        Re: Separate issue

        Plus of course one of the options for 'provision of online collaboration tools' is build it yourself, but another is 'outsource to Google'.

        Which leaves you back where you started.

      2. doublelayer Silver badge

        Re: Separate issue

        "Indeed but a university may not want the security risks involved in allowing random students access to their network from random devices in random locations (aka setting themselves up as a private cloud provider)"

        You may need to look at what a cloud provider does. Hosting your own website with files uploaded by your employees is not being a cloud provider. The students don't need to have upload access to the system (although many universities give them that and it's just fine). If the access needs to be restricted, put a login page in front of the files. If you only want logins from inside the university network, put a firewall rule on it. If you want authenticated access from outside the network, give the students a VPN option. These things are really basic and the university already has the infrastructure to do it.

  15. Julo

    Google is not stupid - it just ignores, as a rule, anybody else

    I stopped using most of their products and software if I have an alternative. After they took over Waze - Waze on Android was so buggy that I could not use it where I needed it mostly (in my car). I was publicly admonished that I posted a bug report on a developer site after I got no reply at all on the customer service site. I've learned my lessons and I stoped using them even for search. The arrogance pervades all of their communications.

  16. fpx
    Devil

    "Relying on viral social media posts as a sort of backdoor communication channel to the developers should not be the only option."

    Au contraire. That is the modern way of filtering complaints. If social media decides that an issue is not important, then obviously it is not worth spending effort on. Don't worry, once Google is done fixing issues that attract a billion views, they might get concerned about the issues that attract only millions of views.

    Who would ever get rich in a billion user market by fixing issues that only affect a few?

  17. Anonymous Coward
    Anonymous Coward

    "A review cannot be requested for this restriction"

    So a giant fuck you, if you're not big enough on social media to have your complaints matter.

  18. Anonymous Coward
    Anonymous Coward

    Google is shit at software

    News at 10 headlines.

    Sorry, this is just to be expected from Google.

    Someone needs to sue them into oblivion for false copyright claims. Start at $100B. That might just might get them to take notice.

    Oh, and add the likes of Facebook into the mix as well. These 'bots' are getting stupid.

    I had a post get flagged for copyright infringement when I quoted three lines of my first novel. I put in a reference to the novel and under fair use, I could do that but no, I was dinged for copyright violation of my own work. Sorry, how can I be in violation when I wrote the text anyway? The systems see a match and 'ding', go to jail, do not pass go, do not collect $200.

    Queue wind blowing and sagebrush rolling down the street if you try to get it corrected.

    In the end, I had to repost the text, but with several words changed. Then I added a not explaining what the correct words were.

    1. fpx

      Re: Google is shit at software

      Unfortunately, the legal system will first ask you for demonstrable, personal harm that you have suffered. If there is none, if it is not quantifiable, or if it is zero because the service that you were unable to use was free to start with, you are out of luck.

      On the other hand, let's not forget that the bullies in this match are the rights holders. Not the individuals writing books or making movies, but the large agencies. In the fight of Disney, Sony, Random House etc. against Google, we are collateral damage.

  19. Why Not?

    20 years ago I worked for a big American corporate.

    They scanned all network drives and eventually personal drives for Audio & Video files , if any were found you got a personal meeting with HR because RAA threatened them with massive fines if Audited. It wasn't actually too bad a decision overall the number of pirated songs, films and porn decreased rapidly.

    The thing they hadn't thought about was the technical team generated their own content, our technical manager who recorded videos showing how to operate or fix our products was invited to HR on a daily basis.

    This sort of thing needs to happen because copyright abuse is rife.

  20. Peter D

    The algorithm is correct.

    My seminal work "1" was published in 1973 and in the following years its sequels "2", "3", "4" and many many more proved highly successful.

    1. David 132 Silver badge
      Happy

      Re: The algorithm is correct.

      "I was expecting a knighthood for my efforts but they made me a Count instead..."

  21. Anonymous Coward
    Anonymous Coward

    But Google are right and this kind of thing has got to be stopped!... someone could be embedding pr0n in that '1' using crypto steganography

  22. The Empress

    Not good enough

    Google should be given ownership of all your personal papers, financial records and health information so that you can be 'protected'

    ALL HAIL BRANDONIA

  23. chololennon
    Happy

    Use MEGA instead

    After Dropbox started to remove support for Linux users (only ext4 is supported) and Google started to flag my files, I switched to MEGA instantly (well supported on Linux/Android, 15/20 GB of free storage, and content is encrypted).

  24. Anonymous Coward
    Anonymous Coward

    >Reader (UK lingo for professor)

    Don't say that too loudly near actual professors.

    1. W.S.Gosset

      Yeah, kinda fell between 2 stools there. In the states, "professor" has been degraded as a title to encompass lecturers and, from what I can gather, even tutors. Whereas in the commonwealth, "professor" is maybe a couple per department, "reader" is basically a professor without a "chair" -- deputy god rather than god.

  25. Sampler

    Clearly

    They can be only 1 (1)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like