Re: it's part of the design, actually
The article seems to be about not so much how much space is taken up on github's backend, but what happens when you sample code from github, in which case forking does matter. Say you want to know how many people use mmap versus malloc for memory allocation. You pick C code at random and count. However if a large proportion are forks then popular projects are over-represented and your results are skewed towards the properties of those projects.