back to article Getty's image-scraping sueball against Stability AI will go to trial in the UK

Getty's lawsuit against Stability AI, claiming the startup "unlawfully copied and processed millions of images protected by copyright" from its photo archive, will go to trial in the UK. In January, Getty sued Stability and accused it of infringing upon the photo library's intellectual property rights and copyright protections …

  1. Bebu Silver badge
    Headmaster

    Having trouble getting my head around what exactly is at issue here

    I am guessing this ruling by the court is pretty much procedural: jurisdiction and non-trivialty enabling the parties to slug it out in this court.

    Is Getty claiming their photographuc images are in some sense stored in the LLMs and thus violates their intellectual property rights? In my general ignorance I imagine any such storage would have to be pretty holographic and perhaps low fidelity. I don't imagine there is any query or input to the LLM that would exactly recreate or retrieve a particular Getty image.

    If this were still a copyright violation how is it different from an artist, who having seen a large number of Getty images, paints a portrait that accidentally resembles one of those images? Is it premeditation and intent?

    Another possibility is this use contravened the conditions of use imposed by Getty.

    AI, ChatGPT, LLMs and the rest of that scurvy crew can be consigned to the deepest circle of hell as far as I am concerned but I would worry were any new legal principles in intellectual property law established in this action could have extremely far reaching and quite insideous consequences.

    1. Jellied Eel Silver badge

      Re: Having trouble getting my head around what exactly is at issue here

      Is Getty claiming their photographuc images are in some sense stored in the LLMs and thus violates their intellectual property rights?

      I don't think they need to claim that, simply that the images were used without paying or permission to use those images, and that the usage was for commercial gain. I'm just a photographer and occasional author, not an IPR lawyer. One aspect I am curious though are IPR issues around creating unlicensed derivative works based on copyright material that's been scraped without permission.

      1. Catkin Silver badge

        Re: Having trouble getting my head around what exactly is at issue here

        >I don't think they need to claim that, simply that the images were used without paying or permission to use those images, and that the usage was for commercial gain.

        If a judgement in their favour were secured on that basis (I don't think this is the case) it would be, in my view, disastrous. For example, that would be the end of any appraisal of copyrighted materials* and any artist would have to be very careful to never mention viewing a copyrighted piece of work.

        *or, worse, the rights holder only granting permission to individuals guaranteed to generate a positive review and/or making the production of a positive review a condition of viewing for the purposes of review

        Then again, this is the same Getty that sued a photographer for using her own images, which she had released for free use (Carol Highsmith).

        1. Peter2 Silver badge

          Re: Having trouble getting my head around what exactly is at issue here

          No, it wouldn't. Your free to look at the images in getty images without paying for them, with the intention that you pick one that you wish to use. Your free under existing copyright law to look at an image (or block of text such as yours) and then criticise it without reproducing it.

          Your simply not allowed to put the entire contents of getty images in a library and then programming something using it as the basis for training data without paying for using the images. If you paid for the images, then legally they'd probably be fine. The fact that they want to use the images without paying for them is the problem.

          1. Catkin Silver badge

            Re: Having trouble getting my head around what exactly is at issue here

            I recommend reading the CDPA 1988. The work has been made available to the public, regardless of watermarks. Therefore, it comes down to whether the output (the trained model) is infringing. There is a blind spot in the act, wherein decompilation of computer programs is permitted (providing it is to a lower level language) but no provision either way is made for the output of analysis. That's not to say it's necessarily in the clear, more that a legal precedent or relevant legislative addendum has not been made at this time.

            WRT payment, that is immaterial unless there is a dispute as to the existence of a contract (hence the use of token sums in some cases) in the first place. What matters is the contents of any contract and whether it is enforceable under the law. Your basic legal rights cannot be waived by any contract you're signing. In your example, you're free to take a private copy of the material you're criticising for the purposes of personal reference, provided that, as you said, you're not reproducing it in your final work.

            I understand that it seems simple common sense but the legal nuances are as important as the broad strokes. I'm not sure if you remember but it wasn't that long ago that copyright robber barons tried their level best (and, in some areas, succeeded for a time) at preventing individuals from recording live broadcasts for personal use.

            1. Jellied Eel Silver badge

              Re: Having trouble getting my head around what exactly is at issue here

              I recommend reading the CDPA 1988. The work has been made available to the public, regardless of watermarks. Therefore, it comes down to whether the output (the trained model) is infringing.

              I.. don't think so, but IANAL, just someone who has to consider IPR stuff and consult lawyers if necessary. Sure, the work has been 'made available', but with limitations. An image library kind has to do this because otherwise people wouldn't be able to browse, and then licence images. This seems to be about making money from the images that have been made available, without necessarily having the rights to do this. I don't think it's a blind spot in the law, yet, just one that perhaps hasn't properly been tested. Kind of the old adage that something isn't really illegal until they've been taken to court and found guilty of doing it.

              So I think it's testing the input side, so if ingesting a slew of copright works as 'training data' without permission or paying licences is a breach of copyright, or not. For the output, that is perhaps clearer, ie creating derivative works, plagiarism etc, which has all been pretty well tested in court. Maybe not to the extent of an AI doing it, but if it outputs something like a Hobbit, I'm sure the Tolkien Estate's lawyers will be on them like a ton of Balrogs.

              I'm not exactly a fan of Getty, but I'm not a fan of big tech helping themselves to IPR without compensating the original creators, especially given the potential for displacing human creators.

              1. Catkin Silver badge

                Re: Having trouble getting my head around what exactly is at issue here

                Sorry, by 'blind spot' I meant one that hadn't been tested, rather than a loophole. IANAL either but cases like this worry me because of the broad powers they might grant to companies with a long history of abusing those powers. I only lean slightly towards the model trainers on the basis that, if copyright holders get additional (excessive) rights, it will be almost impossible to roll those back, while a 'no comment' from the judicial system leaves room in the future for some clearer thinking.

          2. David 164

            Re: Having trouble getting my head around what exactly is at issue here

            You should be fine if your AI can view the images for training in the same way as a human does, instead of copying them into a seperate training archive.

            1. Jellied Eel Silver badge

              Re: Having trouble getting my head around what exactly is at issue here

              You should be fine if your AI can view the images for training in the same way as a human does

              But it can't, unless they can somehow argue that building the training dataset can be done without copying the images. A key point behind copyright though is allowing the creator exclusive rights to exploit the work. Scrapers are exploiting the work without permission, or compensation. Perhaps a fair compromise might be for all AI developers to put their code on public display so we can train our own 'AI's using their IPR. Somehow, I suspect they wouldn't agree to their code being included in a 'training library'.

      2. David 164

        Re: Having trouble getting my head around what exactly is at issue here

        They seem to claim that the images were copied on to a seperate harddrive and then use to train the AI, Stability AI doesn't seem to deny that, in fact they seem to be saying the responsibility for this issue doesn't lay with them, that lays with a non profit in Germany where they obtain their training data from.

      3. Anonymous Coward
        Anonymous Coward

        Re: Having trouble getting my head around what exactly is at issue here

        You should see if you can locate any images that you own that contain Getty Images' watermark. Then you should send that to Stability AI and ask if they trained their junk on your image. If they did, you can side with both Stability and Getty and win no matter what.

    2. werdsmith Silver badge

      Re: Having trouble getting my head around what exactly is at issue here

      If this were still a copyright violation how is it different from an artist, who having seen a large number of Getty images, paints a portrait that accidentally resembles one of those images? Is it premeditation and intent?

      If the art school where the artist was trained used a whole load of Getty images in their training material then what....

      1. Catkin Silver badge

        Re: Having trouble getting my head around what exactly is at issue here

        Wouldn't that be a legal dispute between Getty and the school, independent of the output of any given art student? That is, the infringement that might occur would be down to how the school presented images owned by Getty to their students.

        1. werdsmith Silver badge

          Re: Having trouble getting my head around what exactly is at issue here

          Wouldn't that be a legal dispute between Getty and the school, independent of the output of any given art student?

          In this case, Stability is analogous to the school and their AI product is the student.

    3. Falmari Silver badge

      Re: Having trouble getting my head around what exactly is at issue here

      @ Bebu Getty are claiming 3 counts of copyright infringement.

      A) During the training process.

      B) The pre-trained models.

      C) The output.

      In the case of A) the infringement would be if the images that make up the training data have been stored on servers and/or computers in the UK. If Stability AI are found to have done that and are therefore infringing on copyright that would not be setting new legal principles in intellectual property law as the UK already has laws covering the copying and storing of copyrighted digital data.

      For B) it is for pre-trained models imported into the UK. Not sure if they are claiming that the models are storing the images or that the models are derived from the images, as I have only skimmed the linked pdf.

      For C) some of the output from the models would have to be similar enough to Getty’s copyrighted images to be deemed infringing under current UK copyright laws. The same copyright laws that apply to humans.

      What’s interesting is that this seems to be the first case where there is a specific claim for copyright infringement against the collection, copying and storing of the training data. Separate to any claim that the model contains copyright data.

      Probably because this case is in the UK and also Getty have already entered into licensing agreements with tech companies, giving them access to images for training models .

      1. I ain't Spartacus Gold badge

        Re: Having trouble getting my head around what exactly is at issue here

        Falmari,

        On C - there was an article in El Reg within the last week about an "attack" on language "AI"s where you get them to repeat words multiple times - and they sometimes spit out whole chunks of their training data verbatim. Including I believe email addresses - which suggests their data might have been scraped from all sorts of unsavoury places. And also including extracts of copyright work. Something I can imagine ending up in a lawsuit sooner than later.

        I don't know how the image ones work - because language isn't the same as images - but it wouldn't surprise me if there isn't significant amounts of copyright data still hidden away in there. One way to find it might be to come up with particularly specific prompts that can be traced back to only a few images in the training data - and see whether the models just repeat that image back to you or "compose" something similar or derivative.

        1. Falmari Silver badge

          Re: Having trouble getting my head around what exactly is at issue here

          @I ain't Spartacus, I must have missed the Reg article I will have a look for it, thanks. :)

          I have read this paper Scalable Extraction of Training Data from (Production) Language Models, its probably the paper the Reg article referenced.

          I was programing neural nets to classify crops from SPOT satellite images in the mid 90s and I have always been of the opinion, that neural nets and LLMs learn there is no memorization*. But having read that paper I tend to agree with "but it wouldn't surprise me if there isn't significant amounts of copyright data still hidden away in there.".

          * Is that even a word? ;)

          1. I ain't Spartacus Gold badge

            Re: Having trouble getting my head around what exactly is at issue here

            Falmari,

            That looks like the name of the paper, but I didn't read it. Only this article, summarising it here.

            On your final note, if I can summarise, then there can be summarisation done in order to produce a summary. And thus memorisation in order to produce a memory. Although more z in the way you spell it - possibly being on the other side of the big blue wobbly thing, otherwise known as the Atlantic.

            1. Falmari Silver badge

              Re: Having trouble getting my head around what exactly is at issue here

              @I ain't Spartacus, Thanks for the link.

              BTW I am in the UK just my browser seems to have defaulted to US spelling for some reason.

              1. I ain't Spartacus Gold badge
                Happy

                Re: Having trouble getting my head around what exactly is at issue here

                Falmari,

                I am in the UK just my browser seems to have defaulted to US spelling for some reason.

                Aha! So you're not in the wrong place, but in the wrong time then.

                From what I can tell those pesky Zs were in the original spelling of a lot of english words - back around the 16th/17th century when spellings were just starting to standarise. Which is right when people were buggering off to found the colonies in America. I guess they were more consistent than us, and have stuck to their Zs ever since - while we went all namby-pamby and slowly chnaged over to using Ss (should that be essess?) instead. Although the OED says both are still correct in written english.

                Perhaps you've just set your computer's clock to 1623 by mistake?

          2. doublelayer Silver badge

            Re: Having trouble getting my head around what exactly is at issue here

            "I was programing neural nets to classify crops from SPOT satellite images in the mid 90s and I have always been of the opinion, that neural nets and LLMs learn there is no memorization"

            I'm guessing you've still seen it. Even without doing it very often, when I've trained neural networks, I've managed to get them to overfit the training data and start memorizing things. I'm thinking of a model that ended up simply memorizing the training data and expected answers, meaning it would score very highly while training and then fail all the other tests. Of course, something that blatant is rejected as a bad model, but that's when it's small enough that the overfit is obvious and causes problems for intended use. This model is much larger, so whether the pictures are stored in their entirety or not is harder to prove.

            For context, I think this is the company that produced the model that started to introduce Getty Images watermarks into generated pictures. They weren't pixel-for-pixel correct watermarks, but you don't have to copy every pixel to commit a copyright violation. If the model did what it did to the rest of the picture as well as to the watermark, would you decide that doesn't count as storing the image in the model?

      2. katrinab Silver badge
        Megaphone

        Re: Having trouble getting my head around what exactly is at issue here

        The actual training model, by which I mean the sort of thing you can download from HuggingFace, is a computer program which is compiled from source code which includes the infringing images.

  2. Anonymous Coward
    Anonymous Coward

    Getty, Getty, where do I know that name from?

    Isn't that is the name you see written all over copies of images they've scraped and offer to sell back to you?

  3. xyz Silver badge

    I would imagine...

    That this is the same(ish) as sampling music. You need permission.

    Location is irrelevant because you want to profit from your sampling.

    Scraping prices from competitors' websites is one thing, scraping someone's talent (painting/photo/whatever) is totally different.

    IMHO

    1. katrinab Silver badge
      Megaphone

      Re: I would imagine...

      The price of an item isn't a creative work protected by copyright. That is the difference.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like