Simples
What's so hard about finding copied copyright code automatically ?
It merely needs all the code that's to be protected be made available for diffing against github et al. That shouldn't be a problem for those with lots of code : companies like Microsoft, Oracle, etc. already have big server farms, right ?