Reply to post: Re: it's part of the design, actually

More than half of GitHub is duplicate code, researchers find

ibmalone

Re: it's part of the design, actually

The article seems to be about not so much how much space is taken up on github's backend, but what happens when you sample code from github, in which case forking does matter. Say you want to know how many people use mmap versus malloc for memory allocation. You pick C code at random and count. However if a large proportion are forks then popular projects are over-represented and your results are skewed towards the properties of those projects.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon