Reply to post:

Even robots have the right to learn from open source

Michael Wojcik Silver badge

Based on anecdotal experience, CVE rates, and some academic studies in related areas, I'd say the overwhelming majority of code in GitHub is crap.

That's also true of most other repositories, of course.

It looks like a prototype of the Codex model used by Copilot was trained on a massive amount of data: "The amount of training data is 54 million public software repositories with 179 gigabytes of unique python files", according to one source. Doing any meaningful data hygiene on that sort of big-data volume is all but impossible. So we can assume Copilot was trained on a great deal of rubbish.

Note, too, that one of Copilot's goals is "filling in repetitive code" – a task that explicitly violates the DRY principle and suggests that a redesign was in order anyway. Copilot appears to be in significant part a tool for creating lousy code.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon