Reply to post • Even robots have the right to learn from open source • The Register Forums

Tuesday 12th July 2022 17:03 GMT Michael Wojcik

Based on anecdotal experience, CVE rates, and some academic studies in related areas, I'd say the overwhelming majority of code in GitHub is crap.

That's also true of most other repositories, of course.

It looks like a prototype of the Codex model used by Copilot was trained on a massive amount of data: "The amount of training data is 54 million public software repositories with 179 gigabytes of unique python files", according to one source. Doing any meaningful data hygiene on that sort of big-data volume is all but impossible. So we can assume Copilot was trained on a great deal of rubbish.

Note, too, that one of Copilot's goals is "filling in repetitive code" – a task that explicitly violates the DRY principle and suggests that a redesign was in order anyway. Copilot appears to be in significant part a tool for creating lousy code.

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

Reply to post:

Even robots have the right to learn from open source

POST COMMENT House rules

Enter your comment

Add an icon

About Us

Our Websites

Your Privacy