back to article GitHub's Copilot flies into its first open source copyright lawsuit

GitHub Copilot, Microsoft's AI-driven, pair-programming service, is already wildly popular. Microsoft broke out GitHub's revenue and subscription numbers in its latest quarterly report for the first time. GitHub now has an annual recurring revenue of $1 billion, up from a reported $200 to $300 million when it was acquired. It …

  1. Natalie Gritpants Jr

    "Open source is a cancer"

    Forgot who said that, but it looks like GitHub has taken up smoking.

    1. Sceptic Tank Silver badge
      Windows

      Re: "Open source is a cancer"

      It was Ballmer. These are not the kind of things you can toss a chair at.

    2. The BigYin

      Re: "Open source is a cancer"

      No one.

      Ballmer did say that the Linux was a cancer, though.

      1. Anonymous Coward
        Anonymous Coward

        Re: "Open source is a cancer"

        Despite that, Microsquishy seems to have acquired the ability to integrate if not appropriate Linux via WSL. I guess cancer isn't so bad if you can turn a profit...

        1. bombastic bob Silver badge
          Linux

          Re: "Open source is a cancer"

          This asks the addititional question: who exactly is going to assimilate WHOM? At least for the OS, a workable windows running on Linux would be a better model...

          As for open source licensing, in theory this whole situation with CoPilot begs the question of "what exactly is plagiarism"? I'd say if you look at code in a book or online and then write your own it is NOT. But if a machine creates an AI model (like fractals for a photo) and then re-creates that code from the model (in a nearly identical way) it IS plagiarism. Hopefully the courts will agree.

          I am not too happy with such AI writing code. I see gross obvious junior-coder mistakes in THAT future.

          1. Mostly Irrelevant

            Re: "Open source is a cancer"

            "I am not too happy with such AI writing code. I see gross obvious junior-coder mistakes in THAT future."

            How would that be any different from now?

            1. stiine Silver badge

              Re: "Open source is a cancer"

              The junior coder would take much longer.

            2. Anonymous Coward
              Anonymous Coward

              Re: "Open source is a cancer"

              > How would that be any different from now?

              There is a greater chance that some senior programmer is going to glance an eye over what the juniors have written, even if it is after the shit has already hit the fan.

  2. ParlezVousFranglais

    "I don't expect to see a definitive answer this decade."

    And therein lies the problem - by the time a court finally agrees that Microsoft are a bunch of thieving b******s, they will have:

    a) ripped off the code of thousands of others for their own financial gain

    b) altered their ML algorithm sufficiently to still take advantage other's work but recode it when republishing so the theft is impossible to prove

    c) deprecated the current version in favour of a v2/v3/v4 to pretend they didn't benefit all that much

    We all wondered about the real reason for MS purchase of Github - "By joining forces with GitHub, CEO Satya Nadella said, “we strengthen our commitment to developer freedom, openness and innovation.”

    At least now we know it was simply to make the theft of all those resources easier for them...

    1. Anonymous Coward
      Anonymous Coward

      "Too big to fail"

      Year 2032: "Millions of companies depend on our Copilot, we can't change it now!"

    2. doublelayer Silver badge

      I think Microsoft should and probably will lose this fight as well, but some of your accusations are a bit weak.

      "At least now we know it [the acquisition of GitHub] was simply to make the theft of all those resources easier for them..."

      Come on. It's publicly available. I can clone all of that. It doesn't take an expensive ownership and operation to point a downloader bot at the site and start cloning all the repos meeting some criteria. If that was their reason, not only did they start their evil plan years before they started using it, but they've come up with the least efficient heist ever. This suggests their reasons were probably unrelated, given that they can and did get training data for copilot from locations they don't own.

      1. Grinning Bandicoot

        My claim is "else" and "comment" which was used in a school program back in the 60's

      2. Duke of Source

        The rhetoric is perfectly fine given that copyright infringement is often referred to as theft. Might as well have called it piracy.

    3. Steve Davies 3 Silver badge

      re: And therein lies the problem

      Don't forget...

      - Basically, own Linux despite the wailings of Linus.

      - Moved windows onto a Linux Kernel (why pay to maintain the kernel when others will do it for free...)

      - Started issuing DMCA takedowns for people not subscribing to their business model and being an even bigger PITA than they have been since Gates went dumpster diving.

    4. jerlyn

      MicroSoft has stolen software as long as they have been in business. In the beginning they would just intergrate their "own version" of the most popular programs running on their O/S. When sued, which has been often, they've always outlasted and/or settled for pennies on the dollars Microsoft earned from their theft of the intellectual property. It's the most lucrative play in Microsoft's playbook.

  3. Mike 137 Silver badge

    Snowballs anyone?

    Not a surprising development (in the legal sense), and I'm not too unhappy as it's a lousy way to write code anyway.

    But I wonder what the minimum code fragment is that can be considered to be copyright. Unless this gets defined clearly very soon, pretty much every developer in the world will be liable to challenge for copyright infringement once the lawyers start to get interested.

    1. that one in the corner Silver badge

      Re: Snowballs anyone?

      > But I wonder what the minimum code fragment is that can be considered to be copyright

      0x5F3759DF ?

      1. stiine Silver badge

        Re: Snowballs anyone?

        One of the more-advanced-math-than-i-understand videos i've ever watched more than once.

    2. Flocke Kroes Silver badge

      Re: Snowballs anyone?

      Ask your search engine about Oracle rangeCheck.

    3. katrinab Silver badge
      Alert

      Re: Snowballs anyone?

      "But I wonder what the minimum code fragment is that can be considered to be copyright."

      I can't give you a straight answer here, but there is plenty of case law on the subject. It depends on a lot of factors.

    4. bombastic bob Silver badge
      Devil

      Re: Snowballs anyone?

      true - imagine the litigation over use of 'for' or 'if', or (worse) variable names! "I had a 'for(i1=blahbnlah)' in MY code and YOU COPIED IT!"

      1. Anonymous Coward
        Anonymous Coward

        Re: Snowballs anyone?

        If you had "i anything" in your code as a loop counter then you copied it off a Fortran programmer :-)

        Remember you can write Fortran in any language

  4. DJO Silver badge

    FOSS conditions

    A bit late for existing code but it shouldn't be too difficult to add something like "Not to be used for AI or ML training" to the FOSS licence for future code.

    1. that one in the corner Silver badge

      Re: FOSS conditions

      A bit of a blunt instrument?

      Like the sentiment, but in this case isn't the *actual* problem the regurgitation of the inputs by the ML, more specifically, without attribution?

      Because it is possible to feed code into an ML whose results pass does something other than just fling out chunky bits of predigested sources.

      Better ideas for such an ML system exist, but the immediate thought is: how about one that looks for code that is suspiciously close to your copyright material appearing elsewhere, as though it had been spat out by Copilot?

      1. DJO Silver badge

        Re: FOSS conditions

        A bit of a blunt instrument?

        True, but them trawling the entire GitHub code base is also a blunt instrument insofar as it makes attribution (or blame) impossible.

        As AI/ML training falls outside of the uses laid out in the FOSS licence (but is not explicitly excluded) perhaps they should allow GitHub code contributors the choice to opt in or out of having their code uses for training.

        1. orphic

          Re: FOSS conditions

          Or as in academia, acknowledge through some form of citation.

      2. John Brown (no body) Silver badge

        Re: FOSS conditions

        "but the immediate thought is: how about one that looks for code that is suspiciously close to your copyright material appearing elsewhere, as though it had been spat out by Copilot?"

        There;s already software out there designed to look for plagiarism in exams and academic papers that could probably be fairly easily repurposed for the task. Depending on how it works, eg simply looking for matching strings of a certain minimum length, it might well work as is.

        That would certainly find out if Copilot is taking existing chunks of code and regurgitating them. On the other hand, it may well show large chunks of code being reused in other FOSS without acknowledgement and maybe against licensing terms.

        1. Michael Wojcik Silver badge

          Re: FOSS conditions

          It's an alignment-with-errors problem, and there's a ton of existing art and ongoing research, for example in genomics.

          I've known CS grad students to build detectors for this sort of thing with good F0 and throughput, as class projects. If you're only concerned about checking specific Copilot outputs (e.g. those that show up for a set of queries you've defined) against a fairly small codebase (e.g. your personal projects), that ought to suffice. If you want to do bulk scanning you'll likely want to partition the datasets and do some preliminary classification to make the problem tractable.

    2. katrinab Silver badge
      Meh

      Re: FOSS conditions

      Already covered by existing licences surely?

      If you were to train exclusively on GPL licenced code for example, and published the resulting output under the GPL, then that would be fine.

      1. the spectacularly refined chap Silver badge

        Re: FOSS conditions

        You have to retain copyright notices even transplanting code between GPL projects.

        1. yetanotheraoc Silver badge

          Re: FOSS conditions

          If/when Microsoft loses, it will be because CoPilot looked only at the code and completely ignored any software licenses. Of course, due to similar code existing under different licenses, it would have been a nightmare to include licensing. I'm not sure "it would have been really hard to do it the right way" would be a good argument in court, so they will have to come up with a different argument.

      2. T. F. M. Reader

        Re: FOSS conditions

        Already covered by existing licences surely?

        I am not sure about the "surely" part. The upcoming litigation may include a lot of arguments about whether or not AI output should be considered "derived work" (e.g., in the GPL sense) of the training set.

        IANAL

    3. talk_is_cheap

      Re: FOSS conditions

      "shouldn't be too difficult to add something like"

      Copyright does not work that way in most major countries - you do not have to detail that you retain copyright as you automatically own copyright over your own work. Instead, you grant people access to your work or if you are a developer working at a company you will have likely transferer ownership of your work to the company as part of your contract.

    4. Anonymous Coward
      Anonymous Coward

      Re: FOSS conditions

      You're going at it from the wrong direction. If you want to fuck up copilot, just write shit code, and fork it.

      1. Michael Wojcik Silver badge

        Re: FOSS conditions

        Fortunately the community has been doing this for ... well, since the invention of software.

    5. bombastic bob Silver badge
      Devil

      Re: FOSS conditions

      I would rather have the positive affirmation of "OK to use" rather than "NOT OK to use", sorta like "Opt In" vs "Opt Out"

  5. K

    Its not quite as bad as claimed...

    I've been using it for around a year, and even have a subscription, 99% of the time, the code it proposes is literally just statement-completion.. i.e. if x or z then y ... and yes, its very handy, it provides more of a shorthand than anything.

    There have been 2 occasions where it has effectively proposed that is more than 1 line.. and I won't deny, this blew my mind, as one of them fair a fairly complex recursion function.. which did save me a lot of time, but likewise, I knew how to write it myself and it was word for word a standard recursion function.

    But I'm not sure I will renew after this period expire, I'm extremely lazy, so the idea of CoPilot, and AWS's CodeWhisperer is amazing, but both absolutely take the piss when it comes to respecting those who built the products which form their core of their services.

    1. Il'Geller

      Re: Its not quite as bad as claimed...

      The solution to the problem is obvious and will soon be presented to us: GitHub and CodeWhisperer are the first step towards abandoning all programming languages. Indeed, any programming language is a formalized natural language, put in the net of the commonly understandable constructions. Microsoft, GitHub and CodeWhisperer formalize the same common, everyday language and get the same constructions, without hussle-bussle with manual programming . Then why programming languages?

    2. Anonymous Coward
      Anonymous Coward

      Re: Its not quite as bad as claimed...

      So for those two occasions that it offered something more complex, I wonder if that code traces back to one or a very few particular sources. If so, co-pilot could just add comments with URL to (an) attribution(s).

      It's not written in this article, but in a previous article someone identified there own sparse matrix inverse code, even using the same variable names - in that case it should have been simple to provide an attribution.

      Adding attribution when possible seems like a very straightforward good faith no-brainer to me. Also knowing the source can be helpful to the copilot user - there may be context, quality may be inferred, if it's a library there may be useful usage pointers, etc.

      Conversely, not adding attribution when it can be seems like dumbing down.

      1. Ken Moorhouse Silver badge

        Re: Its not quite as bad as claimed...

        Developers should somehow encode their copyright message into their code, then look out for fragments of it in regurgitated code.

        I expect though that there will be many, many copies of similar algorithms in the repository, and the AI will be able to strip out the copyright froth, by treating the code as a black box and testing outputs against input combinations.

  6. Martyn2014

    We need more takedowns!

    Have to wonder if this is not going the way of DCMA takedowns...

    So if I publish something GPLv3 to Gituhub and its deemed good enough for copilot to use and regurgitate, I can see I have some recourse to claim copyright on the created work - but not entirely sure what laws Github has broken. The person using copilot might be able to make a complaint to Github that their service lied and led to them getting sued (or a takedown notice or whatever).

    But how about if I publish something GLPv3 to Github, and then someone copies it to another part of Github but miss-licenses it. They put it up as a no-licence, anyone can have it. When they create the repo they promise to Github that they will not break the law and that they have permissions to push the code they are storing. Github takes the promise as true and uses the code to train their AI.

    At that point, is it not up to the copyright holder to enforce their license terms on the intermediary work?

    1. robinsonb5

      Re: We need more takedowns!

      I have to hand it to MS, acquiring github was a masterful move. Here's why:

      Copyright exists by default on the created work, and by default a third party (including Github / Microsoft) has no right to copy it, redistribute it or do whatever (except for any fair use provisions that might exist in your jurisdiction.)

      Open source works come with a license which specifies under which conditions you may copy and distribute the work. As the GPL is fond of pointing out, you don't have to accept that license, but nothing else gives you the right to copy and distribute the work... except... the github user agreement does just that. By using github's services you give github the right to use the code however they deem necessary in providing "the service", where "the service" now includes copilot.

      Where things get interesting is the question of whether or not you're actually able to give github that permission.

      If you're uploading entirely your own work then you can license it to whomever you wish under whichever licenses you wish - you can even supply a buffet of contradictory licenses and let people pick one. But if your work derives from someone else's you're not free to grant permissions that contradict the original license. So if you grab some random GPL code and make some changes, you're OK to pass the result onto someone else under GPL terms, but not at liberty to give github permission to mix it into an ML meat grinder that will spit out chunks without attribution or license. Likewise the MIT licenses which require attribution - you can't waive that restriction if you didn't write all the code yourself.

      Unfortunately, the github terms of service also include an indemnity clause, so if they get sued as a result of something you uploaded being absorbed into copilot, they can theoretically shift the liability onto you.

      So the tinfoil-hat interpretation of the situation would be that github's value isn't in the codebase, it's in the army of fall-guys!

      1. Falmari Silver badge

        Re: We need more takedowns!

        @robinsonb5 "Copyright exists by default on the created work"

        Does it?

        Section 102(b) of the Copyright Act excludes copyright protection for “any idea, procedure, process, system, method of operation.”

        "In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work."

        https://www.law.cornell.edu/uscode/text/17/102#b

        Is reproducing the same logic that is in someone else's code copyright infringement?

        1. robinsonb5

          Re: We need more takedowns!

          > "Section 102(b) of the Copyright Act excludes copyright protection for “any idea, procedure, process, system, method of operation.”"

          Copyright protects a particular expression of an idea rather than the idea itself. I don't think anyone's claiming that AI systems have any actual "understanding" of the data they process, so the concept of an idea isn't relevant here.

          > "Is reproducing the same logic that is in someone else's code copyright infringement?"

          No, but it pays to be extremely careful about how you reproduce that logic if you want to avoid being accused of copyright infringement - Compaq's clean room reverse engineering and re-implementation of the PC BIOS for example, where the team writing the new BIOS weren't directly exposed to the disassembly of the original. That separation doesn't (and can't) exist in the Copilot scenario.

      2. Richard 12 Silver badge

        Re: We need more takedowns!

        The service does not include Copilot. That is a separately licensed product.

        Aside from that, you cannot arbitrarily expand the meaning of "the service" at your whim. 99.99...% of the code ingested by Copilot was uploaded to github long before anyone knew it might exist. Nearly all of it predates Microsoft purchase of github.

        Otherwise El Reg can lay claim to your salary for permitting you to post here.

        1. Martyn2014

          Re: We need more takedowns!

          But you do.. it's in their terms of service

          "We need the legal right to do things like host Your Content, publish it, and share it. You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time. This license includes the right to do things like copy it to our database and make backups; show it to you and other users; parse it into a search index or otherwise analyze it on our servers; share it with other users; and perform it, in case Your Content is something like music or video."

          You don't need to use their service, but if you do then you agree to their terms...

          I would even go as far to say that if a copyright holder uploaded something to GitHub then this would trump any licence the copyright holder put on the code.

          1. yetanotheraoc Silver badge

            Re: We need more takedowns!

            "I would even go as far to say that if a copyright holder uploaded something to GitHub then this would trump any licence the copyright holder put on the code."

            Doubtless that will be one of the many arguments put forward by Microsoft's legal team. Whether one thing trumps another thing seems to be the pinnacle of civil disputes.

            1. Martyn2014

              Re: We need more takedowns!

              Unless public release binds the copyright owner to that licence eternally, then the copyright owner has the right to release their work under any licence they want.

              So I will retract my use of the word 'trump' and state that regardless of the licence included in the code by uploading to GitHub copyright holders are exercising their right to licence their work in whichever way they want.

        2. yetanotheraoc Silver badge

          Re: We need more takedowns!

          Right. Whether CoPilot is part of "the service" the end user agreed to will be a bone of contention. Microsoft's opinion is of course the service is whatever they provide, the user agreed to all of it. Same as with their OS.

          1. Martyn2014

            Re: We need more takedowns!

            I mean, it is there in the terms of service...

            "The “Service” refers to the applications, software, products, and services provided by GitHub, including any Beta Previews."

  7. Howard Sway Silver badge

    Is it "fair use" or is it intellectual property theft?

    I think that any claim to be "fair use", when the thing is spitting out copies of billions of lines of code, and making a billion dollars in revenue should be a very easy argument to defeat. This is why I think that the case has a good chance of succeeding : fair use is about using small amounts of copyrighted content, not harvesting it on an industrial scale.

    The other main argument is about whether terms and conditions of Github take priority over established copyright law. Again, this is shaky ground for MS because copyright is a strongly established principle, and separate from licensing, which is a kind of contract.

    To me, this is an important case because of the principles involved, not because of the money MS are making, and could set some horrible precedents if it's not considered properly,.

    1. ComputerSays_noAbsolutelyNo Silver badge
      Joke

      Re: Is it "fair use" or is it intellectual property theft?

      If the same standard apply to code as they do for movies, music and books, i.e., copyright extends 70 (?) years after the death of the author; then Co-Pilot might be legal in say 150 years?

      1. Tromos

        Re: Is it "fair use" or is it intellectual property theft?

        "...might be legal in say 150 years?"

        Not if Disney get in on the act.

      2. Bartholomew
        Joke

        Re: Is it "fair use" or is it intellectual property theft?

        > copyright extends 70 (?) years after the death of the author; then Co-Pilot might be legal in say 150 years ?

        Sounds about right except for Mexico, they have death of last author plus 100 years! Oh and Yemen which is life plus 30 years, which sounds more reasonable.

        Imagine of patents has the same duration as copyright, sometime in the next century or two we would probably have our first jet engine. Maybe that could be a solution to global warming *hard to type when laughing*, longer patents. Life of inventor plus 100 years, most of the world would still be using steam power, so yea maybe not a solution to global warming :) But would be a fun fictional parallel universe for a film - Amish world. I can picture the tagline now "Amish world - there is no Rumspringa"

    2. Graham Cobb Silver badge

      Re: Is it "fair use" or is it intellectual property theft?

      IANAL (but I am guessing you aren't either :-) ). My understanding is that "fair use" has little to do with value or scale. It is the nature of the use, whether it is transforming it into something else or just reproducing it, etc.

      Also, of course, "fair use" is a US legal concept - there is no such established principle in UK law.

      1. rag2

        Re: Is it "fair use" or is it intellectual property theft?

        In the UK there is a concept of Fair Dealing, which isn't quite the same. (Introductory from British Library at https://www.bl.uk/business-and-ip-centre/articles/fair-dealing-copyright-explained) Of course, it's never quite that simple. For our American cousins (and possibly others) IANAL.

    3. Ken Hagan Gold badge

      Re: Is it "fair use" or is it intellectual property theft?

      " set some horrible precedents if it's not considered properly "

      Well if MS manage to drive a coach and horses through the copyright protection of code, they might be amongst the biggest losers.

    4. Johan-Kristian Wold 1

      Re: Is it "fair use" or is it intellectual property theft?

      It seems to me that this is basically an automatic code obfuscator that also removes identifying features from the original code.

  8. Greybearded old scrote Silver badge

    Really

    Just vote with your (virtual) feet. Other code hosting services are available.

    Oh yes, you let them hold all the metadata hostage didn't you?

    1. Michael Wojcik Silver badge

      Re: Really

      And GitHub is terrible for many other reasons, so if Copilot is the final one that gets you to move, then so much the better.

  9. DJV Silver badge

    Is it "Copilot" or "CoPilot"?

    The author of this piece appears to use both with reckless abandon!

    1. Neil Barnes Silver badge
      Alert

      Re: Is it "Copilot" or "CoPilot"?

      Just call it adaptive cruise control with (some) lane following and be done with it.

      1. fidodogbreath

        Re: Is it "Copilot" or "CoPilot"?

        Both are wrong. It's Copalot. As in, "cop a lot of other people's work."

  10. Will Godfrey Silver badge
    Unhappy

    Getting off github

    Would this actually do any good? It seems they would still have a copy of your code, so could continue to scrape what they like.

    1. Ken Moorhouse Silver badge

      Re: Getting off github

      If one were to replace all their code with code containing subtle mistakes then hopefully the service will latch on to the latest (inferior) version only. Methinks though that if they keep all versions of uploaded code then this will defeat these attempts.

      1. Ken Hagan Gold badge

        Re: Getting off github

        Since the purpose of git is to keep all versions of the code, it would be fairly easy to ignore recent code.

  11. steelpillow Silver badge

    Where is PJ when you need her?

    Seems we need a new PJ for the new millennium.

    Work from home. No national boundaries this time. Any offers?

  12. Sceptic Tank Silver badge
    Windows

    So why doesn't Microsoft publish the source code for all their products online and we'll take what we need, rename variables, shift some lines of code around, add/delete a few comments and there should be no copyright dispute. We won't even fix any bugs.

    I am glad to report that I did my bit many years ago to thwart these robotic coding attempts: I have some buggy, half-arsed projects on GitHub with code that I am not proud of. Whomever gets that forced into their project by the bot net is going to spend more time debugging arcane code than what it would have taken to just write the stuff themselves. Hahaha haaa! Despicable!

    1. Grinning Bandicoot

      Remember CPM! The pirate of Redmond, ol' 666 himself, does and how MS sure was a close match. Of course Double Space was sure coincidence.

  13. Spazturtle Silver badge

    Most human programmers were also trained on open source code

    Is all the code they write also copyright infringement?

    1. Anonymous Coward
      Anonymous Coward

      Re: Most human programmers were also trained on open source code

      You cannot should not be able to patent software algorithms. This is about copyright (and copyleft).

      The human brain doesn't work like a digital computer because it is constructed differently - it made from ion pumping analog circuits that are noisy and imprecise. In general human brains cannot remember code photographically - the only choice is to deeply internalize the underlying algorithm in some neural encoding, which can later be used to generate new and unique instantiations of that algorithm.

      Ion pumping analog computers are coming along though - e.g., New Scientist, "‘Artificial synapse’ could make neural networks work more like brains - Networks of nanoscale resistors that work in a similar way to nerve cells in the body could offer advantages over digital machine learning" Imagine the circuitry of the human brain, but built with solid state components making it a billion times faster.

      1. bombastic bob Silver badge
        Devil

        Re: Most human programmers were also trained on open source code

        I'd shorten that down to "concept vs copy". Or in legal specifics, patent vs copyright.

        It is very hard to argue (In My Bombastic Opinion) that an AI-based programming algorithm is anything more than a fuzzy data compression and expansion method. As such the data from the program source it scanned is uncompressed and included in the output. Whereas humans, of course, would have to create something fresh the way it has been done for 100,000 or so years. OK so your neighbor made a wheel. You can make one, too. NOT plagiarism (but maybe violates his 'patent'). etc,

  14. The BigYin

    The training data contained GPL'd code.

    The GPL (with few exceptions) applies to derived works.

    Copilot can this emt GPL'd code.

    Thus any product that used Copilot generated code may be subject to the GPL.

    This is not a problem with the GPL, it's a fine license and ensures software freedom.

    The problem is bullies like MS not respecting others' licenses.

    1. Anonymous Coward
      Anonymous Coward

      Copilot can this emt GPL'd code.

      The problem is: "Is Copilot emitting the same code, or code the looks alike - even very alike - but it is not a verbatim copy but a product of its algorithms?" Because even a human programmer may write code that looks identical but it is not a verbatim copy. Code is not a novel, there aren't many innovative way to write it.

      Many algorithms have common and obvious implementations. Now, if we forbid writing code that looks very alike, programmers would be forced to write bad algorithms - or we may have only one program.

      So it's just the question if an AI just copies or "creates"....

  15. 桜沢墨

    With copyright, less is more

    While the kneejerk reaction might be to find something in copyright laws to hit microsoft with, you have to remember that this could have consequences that are larger than copilot. If microsoft actually gets hit with something and we have a new precedent or law for copyright, it could end up backfiring on the little guy later. While trying to 'regulate' microsoft, you end up putting down more regulation that gets in the way of everyone else.

    It almost seems this way for all of copyright, really. While it might be nice that people can't 'steal' your ideas from you, you've ended up causing a torrent of problems.

  16. Sudosu Bronze badge

    Right in the name

    Copilot, pronounced Copy Lot

  17. Waryofbigbro

    What about Abstract Semantic Representation?

    If CoPilot or competing AI systems generate ASR rather than AST for training their ML systems, is this a violation of copyright? ASR could generate source code in a number of languages from the language-invariant concept in the ASR.

    LFortran, a modern LLVM-targeted Fortran compiler, generates an ASR of the source code. This enables cross-compilation to Python and other languages.

    https://gitlab.com/lfortran/lfortran/-/wikis/GSoD%20Proposal%202022%20-%20LFortran%20Compiler%20Developer%20Documentation

  18. Anonymous Coward
    Anonymous Coward

    Interesting proposal! Notably, ASR models failed miserably for human language and have been superseded by NN models. But at gut level it seems like logical programming ought to be representable. The devil is in the details.

    Where does dumb mapping end and intelligent comprehension begin?

  19. Bartholomew

    Time to go back to the drawing board ?

    How hard would it be to generate different training datasets based on license, and then fully track attribution during the training process.

    This suggested fragment of code was automagically generated was and still is GPL 3.0 licensed and based on the contributions from this list of 4000 people. Both of which must be included in your code if you use the suggested code.

    Microsoft also probably has enough in-house source code that they could use as a training dataset, as long as they did not mind leaking source code to their core products.

    1. Ken Hagan Gold badge

      Re: Time to go back to the drawing board ?

      "How hard would it be to generate different training datasets based on license, and then fully track attribution during the training process."

      Very, I would have thought. Determining what licences apply to particular bits of code in a mixed project might require natural intelligence. You might end up having to limit the system to code that has been opted-in by its owner (as someone suggested a few million comments ago) and that opens up the possibility of "hostile" training data.

      There may simply be no way for Copilot to be both legal and worth using. Too bad, Microsoft. The world does not owe you a business model.

    2. mark l 2 Silver badge
      Joke

      Re: Time to go back to the drawing board ?

      Imagine the terrible code Copilot would come back with if trained on the in house Microsoft code for Windows though

      1. Bartholomew
        Coat

        Re: Time to go back to the drawing board ?

        Hello - I'm Microsoft Clippy - I'm here to help. Which Zero-day would you like added to your code ?

  20. Anonymous Coward
    Anonymous Coward

    I question the whole concept of "auto-complete" and "auto-generation" approaches

    Co-pilot's concepts don't appeal to me much. Most of what I work on for code is rather specialized and does not follow traditional behavior patterns because it is integrating existing systems the code has no control over. That kind of interface coding doesn't lend itself well to automation.

    Personally I think they're barking up the wrong tree. I absolutely despise the "autocomplete" behavior of IDEs that auto-insert brackets and close-parens and the like. I'm a touch typist; most of those interface "enhancements" slow down my input rather dramatically by inducing typos and "alternative interpretations" of what I was typing. Maybe if I was a hunt-and-peck typist I wouldn't feel that way about autocomplete technology.

    The other problem with approaches like co-pilot is they assume there is nothing more to learn and nothing more to do differently; that all you have to do is regurgitate what was done before, rephrased. I seriously doubt we're at the end of computing history...

  21. captain veg Silver badge

    GitHub Copilot, Microsoft's AI-driven, pair-programming service

    I'm no expert, but I thought that the point of pair programming was to add a second pair of eyes to spot errors rather than to blindly cut and paste code off the internet.

    -A.

    1. Michael Wojcik Silver badge

      Re: GitHub Copilot, Microsoft's AI-driven, pair-programming service

      Yes, I was going to post something similar. Copilot is in no way a pair-programming mechanism. The purpose of pair programming is vigilance, not copy-and-paste.

      You could certainly train a model on common errors and have it flag those – though we already have on-the-fly static analysis without needing vast resources for a transformer architecture with a zillion parameters, so that would be Kind Of Stupid. But that would be something along the lines of mechanized pair programming. Not particularly good pair programming, since pair programming works best with agents that understand the intent of the code; but it would be closer to pair programming than Copilot is.

  22. Mr Anonymous

    If it's trained on open source code

    It can only create open source code.

  23. M.V. Lipvig Silver badge
    Joke

    I OWN THE COPYRIGHT!

    I have copyrighted the 0 in computing!

    I have copyrighted the 1 in computing!

    I AM COMING FOR YOU ALL!

    Unless you are willing to buy licenses to use 0s and licenses to use 1s from me for the low low price of 200 quid/head, my lawyers will be in touch. Don't like it? Use o's and i's instead.

  24. georgezilla

    So tell me ....................

    ....... just how is it that Microsoft isn't evil again/any more.

  25. herberts ghost

    Machine cryptomnesia!

    I can just see the AI under cross on the stand??? Oh wait ... I can't!

    You can imagine the AI's creator being asked why it did whatever it did. Its like asking a dog trainer, why exactly did the dog bite the baby.?

    1. yetanotheraoc Silver badge

      Re: Machine cryptomnesia!

      "You can imagine the AI's creator being asked why it did whatever it did."

      Because of the dataset it was trained on.

      "Its like asking a dog trainer, why exactly did the dog bite the baby.?"

      Because it has a mind of its own.

  26. Bacon & Eggs

    It's fair to punch others to face, it's unfair others to punch back...

    I wonder what is the reason for Microsoft to use GitHub/Open Source as a training set, should they own codebases, which needs to be huge, offer enough material at least to support MD own targets like. NET / Windows coding. Wait...No way them to give up all they codebase secrets (and reveal how shifty it is) via AI/Ml to world - but it's okay to do that for other people's code...

    Genuine fair use is to include your own code training set, if you do that it will show to world that you really believe and, walk your walk, In fair use sense. If MS thinks it is fair to include code from the others, the others are rightfully expecting to be able to use snippets of MS own code too with co-pilot.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like