back to article FauxPilot: It's like GitHub Copilot but doesn't phone home to Microsoft

GitHub Copilot, one of several recent tools for generating programming code suggestions with the help of AI models, remains problematic for some users due to licensing concerns and to the telemetry the software sends back to the Microsoft-owned company. So Brendan Dolan-Gavitt, assistant professor in the computer science and …

  1. Il'Geller

    …Copilot relies on OpenAI Codex, a natural language-to-code system based on GPT-3…

    … The models that it's using right now are ones that were trained by Salesforce, and they were again, trained basically on all of GitHub public code…

    I think in the near future there will be a system using directly natural language instead of program code. Indeed, it makes no sense to spend a lot of money on translating such the language into a structured format, when AI can do it automatically. See how AI translators work? Programs are the same translations!

    1. Anonymous Coward
      Anonymous Coward

      My feeling is that natural-to-natural translation is between two domains of near-equal dimension, whereas

      natural-to-program translation is from a lower to a higher dimension, making far more educated guesses about the details.

      1. Il'Geller

        Such “natural-to-natural translation” exists: look at all these translations from one human language to another programs? Now it's just a matter of time, until ... m ... a new tool that will "translate natural-to-natural" will become commertial product. It shall do everything without the participation of programmers and without code.

    2. Flocke Kroes Silver badge

      Re: Flying pigs build ice rink in hell

      Natural language as used by humans combines vagueness with self contradiction. This makes it useless as a single source for programming computers. A big part of a programmer's job is to understand what the problem is so it is then possible to narrow the vagueness where is matters and strip out the defective half of self contradiction. (Unless you are on a cost plus contract, where vagueness is an opportunity to implement the wrong thing and charge extra for a change order.)

      Programmers will not be replaced by AIs with no understanding of the problem converting natural language into code. Far more likely is that humans will learn to understand there own problems and be able to express them clearly and unambiguously. I am sure this will happen as soon as my flying car is powered by a portable fusion reactor.

      1. Il'Geller

        Re: Flying pigs build ice rink in hell

        So what? "... combines vagueness with self contradiction..."? Enough to hire people who can think but don't know how to program. Programming is dead and the profession of "programmer" is a thing of the past. If there is an opportunity to save money then this chance will be realized, no matter what.

        I bet that a huge number of companies are working right now on how to displace programming with AI.

        1. Flocke Kroes Silver badge

          Re: Flying pigs build ice rink in hell

          I bet that a huge number of companies are working right now on how to displace programming with AI.

          Yeah, and I bet some are working on EM drive, cold fusion and orbital steam rockets. Want to buy a bridge?

      2. yetanotheraoc Silver badge

        Re: Flying pigs build ice rink in hell

        "Programmers will not be replaced by AIs with no understanding of the problem converting natural language into code."

        s/will not/should not/

        Software architects have been replaced by project managers with no domain knowledge and/or no understanding of business requirements. Since the software is unfit for purpose, maybe an "AI developer" would be a way out of the mess. No?

        Do something, as long as it doesn't cost much.

    3. elsergiovolador Silver badge

      So how are we going to program?

      "Oi mate! Kindly peep for when the geezer clicks that button, yeah? And if he does then ya know, execute that function I told you to write yesterday, that one with the big for loop, yeah? Oh and we need to monitor the clicks so we know our ad campaigns work, do you know what I mean? Yeah? You'll find tags in the campaigns folder on the G drive. Now do your magic and I am off to pub, cheers!"

  2. vekkq


    Surely the code snippets are small enough so that they can't be copyrightable?

    1. Flocke Kroes Silver badge

      Re: Fair use

      Perhaps, but do you really want to discuss your use of rangeCheck with Oracle's lawyers?

    2. Richard 12 Silver badge

      Re: Licensing

      That question is for lawyers to get rich on.

      License compliance is already difficult enough with the literal myriad of commercial licences that have to be closely inspected.

      Adding snippets of code that were licenced under an attribution clause, yet with no possible way of determining who must be given attribution makes it straight-up impossible.

      The other thing is of course that there's a lot of software patents, and in some jurisdictions even obvious one-liners are patented.

      Of course, patents aren't a problem as long as software using Copilot is not sold in such jurisdictions, but banning it from the USA et al could somewhat limit the market.

  3. vtcodger Silver badge

    Philosophical Question

    If Microsoft doesn't know who you are, where you are, and what you are doing, do you exist?

    1. Il'Geller

      Re: Philosophical Question

      It is too expensive to maintain personal AI centrally. For example, on one of Microsoft computers. Especially if the maintense can be done for free, on a user's computers. Indeed, Microsoft needs the final product, result, not the information that was used for it. As I said twelve years ago: the time of absolute privacy has come! Nobody needs to spy on Internet, makes no sense anymore.

    2. elsergiovolador Silver badge

      Re: Philosophical Question

      I'll flip it on the head. Do they exist?

  4. Anonymous Coward

    "It would be an immense amount of work to try and curate a data set that was free of security vulnerabilities."

    I'll save you the trouble. There aren't any.

    We've seen vulnerabilities from the bottom all the way up to the Linux kernel.

    Instead of searching for the Holy Grail, how about curating the data set to indicate the licenses of the code it was trained on?

    1. Warm Braw Silver badge

      how about curating the data set to indicate the licenses

      Given the small finite number of boilerplate licences, you'd think it would be an eminently suitable task for ML.

  5. Spazturtle Silver badge

    Why does the licence of the code it was trained on matter?

    If I learn to code by reading GPL code and participating in open source development that doesn't mean that all future code I write is covered by GPL, why should it be different for an AI.

    1. Jimmy2Cows Silver badge

      The AI isn't understanding the spirit of the code and putting its own interpretation on it.

      The AI is copying snippets of code verbatim, scraped from licence-controlled sources. Hence whatever the AI generates could well have some kind of onerous licensing requirments, attribution etc.

      1. Il'Geller

        You don't understand that AI is a database of structured texts, which computer can understand. This means that the AI is able to replace any programmer,

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like