
Copilot
Does the DCMA request also apply to any evaluation of the code by Copilot?
Will Github report Copilot?
A California court has granted Twitter's request to unmask the GitHub user who uploaded its source code – along with anyone who "posted, uploaded, downloaded or modified" said code. The subpoena request [PDF], granted yesterday [PDF], is seeking "all identifying information" for GitHub user FreeSpeechEnthusiast, who published …
Gosh, I'd love to see a protracted legal battle between Twitter and Microsoft over that. With a little luck Free Speech Twitler would rage-quit (as he generally does when faced with legal complications) and Twitter would be finished off, while Copilot would get a reputation as a legal minefield. Everyone wins.
Technically, it's unfair to say Twitter has lost half of its value since Elon Musk bought it. It had actually already lost a quarter of its worth during the acquisition since it was an LBO. Twitter essentially borrowed $13B and gave it to shareholders, saddling itself with the debt in order to reduce its own worth, just so that Elon Musk and his pals could buy it for only $31B.
They'll probably have to declare bankruptcy to get out of paying the debt; otherwise the company is as good as dead.
Its also ridiculous for anyone to think it is worth $20 billion. It is nowhere near that given how many advertisers fled and didn't return. Musk will inevitably insert his footgun up his backside again at least one or two more times this year, insuring those who fled stay gone and possibly shaking more loose depending on how many times he pulls the trigger.
>along with anyone who "posted, uploaded, downloaded or modified" said code.
We can reasonably assume the uploader did it from an untraceable account, but people who merely looked at publicly available code on Github probably didn't.
So next time a SCO or Microsoft or Oracle win a court case anyone who browsed posted code on Github could be in the frame.
Then you have the RIAA / MPIA demanding names of anyone who downloaded makemkv or cdripper
I'm not sure what they could do with the info about people who downloaded it? I mean, if someone stumbled across it, downloaded it and had a look, what exactly have they done wrong? They would have a defense that they thought it was free to download as it was on a site specifically designed to allow people to download code.
So, what could Twitter do with their info, other than send them a letter saying "you must delete the code you downloaded"?
>So, what could Twitter do with their info, other than send them a letter saying "you must delete the code you downloaded
It's Musk so normal logic doesn't apply.
I can see corporations putting in rules about not downloading anything from Girhub.
I work in movie software and we have a corporate rule about only downloading source. So we have to build our toolchain from scratch, including building Python itself. We have arguments about using CUDA drivers
Just wondering...
You know how some companies will release computer-readable code or documentation, but add subtle steganographic watermarks to each individual copy released, so that later a broadly leaked copy can be tied back to one miscreant?
Well, if you have a large enough sample of a large enough source repository, one which receives dozens or hundreds of changes per day, couldn't you compare any leaked copy of the repository to a particular time and day it was initially copied? That is, when was this snapshot taken? The current set of changes applied *is* the watermark.
Sure would narrow down the set of central access logs you would have to look at. And might limit the source files one might feel 'safe' to leak, to those not important?
Space Karen already has form for doing exactly this; the particular example was an internal document. IIRC the "fingerprint" was embedded in the exact spacing used which would work as effectively for source code, although running it through a reformatter would clean up that particular approach.
Yes. There are any number of proposals for steganographically watermarking plain-text documents, including by spacing and other variations. For example, with source code you could inject typos into comments. The principles involved were being discussed at least as far back as the early 1990s.
64 points where you can make a single change gives you 2**64 identifiers. Personally, I'd go with more – perhaps 512 points of change, so I can use a group or erasure code to recover the identifier even if some of the changes are removed or obscured.
It wouldn't be hard to hack this mechanism into a git or other SCCS server, based on the identity of the person checking out the code. Of course, you still have to prove operation; the identified user could claim their account had been hijacked, for example. That quickly gets you into technical weeds where a judge or jury are unlikely to have expertise, so it's a battle of expert witnesses.
There are, but not all of them would work. You need each of your points to be easily identified by a computer and impossible to identify from the user. If I notice that my comments come back with typos I'd never have let through, I might assume I missed it the first time, but it would become obvious after a few of these showed up. If it was spacing, there are a lot of programs that can respace a file for a specific style, so as soon as I noticed that there were some weird spacing things, I might run one of those even if I didn't suspect tampering just to get this weird spacing out of my way. Some of these things don't apply to a generic text string. For example, there was the famous incident where a lyrics site watermarked the lyrics they distributed by using both ASCII and Unicode apostrophes which has the advantage of being invisible to the reader, but that tactic would break a compiler.
Not only do you have to be careful about where your watermarks are and that they aren't too obvious, they have to work in a file that's constantly being changed. If you planned to have one bit of your identifier be whether the first character of a comment on line 17 is capitalized, then you have to track the comment so that an extra line at the top of the file doesn't break it, a plan for what you will do if the file is changed and there's now a new comment on line 17, and a plan for if a programmer removes the comment line entirely. A refactor of a module that destroys a lot of your identifier could be hard to deal with automatically. This doesn't make it impossible to do, but it does add difficulty.
You could do that, but if my employer or any other employers I've worked for wanted to, they'd have to change the way they operate. Every employer I've been at has either had source control using git or something equivalent or, in one case many years ago, didn't have anything and when I said that we should be using source control, they said I could use whatever I wanted. This means that I can rewind through all the changes out there, so if I wanted to hide that it was mine, I could artificially discard some commits to make the point harder to identify. Git's commit system is also not going to natively handle watermarks because each modification would change the structures in an obvious way. They would have to patch it to handle them silently and could easily find it hard to do so without breaking things unexpectedly. It could still be done, but it's not going to be a turnkey solution.
Better would be to run it through various filters and leak it piecemeal. The filters should be specific to the languages used and know how to vary things that don't matter, such as whitespace and other formatting aspects, and capitalization in comments. Depending on your purpose, it might be reasonable to do some obfuscation of identifier names.
> GitHub said it treats each fork as a distinct repository that must be identified separately
Which makes sense from a legal standpoint. Not all repositories marked as a "fork" may be infringing, as the fork may have happened before the claimed infringing code was added to the repo (as a trivial example, an un-updated fork of the initially empty repo is not infringing). So only Twitter can say which repos it claims are infringing; Github cannot read Twitter's mind, and the law doesn't require it to try.
Twitter also needs to be careful which forked repos it claims are infringing, particularly for private repos. It should not be assumed that the Twitter code is itself in non-infringing, and including a party with an infringement counter-claim could be devastating. So Twitter's lawyers are obviously going to be cautious in adding additional parties.
Have you ever met a right winger who wasn't? When rules get in their way, it's a violation of some absolute "right". When that same "right" is used against them, they deploy the full force of the law. Cue Galbraith:
The modern conservative is engaged in one of man's oldest exercises in moral philosophy; that is, the search for a superior moral justification for selfishness.
How are Meta doing with their distributed version of Twitter?
It should be quite easy to fork an open source e-mail client to incorporate about 75% of what Facebook does and most of what Twitter does using the e-mail protocol to create a distributed, encrypted social media system. Get on with it, someone.
"It should be quite easy to fork an open source e-mail client to incorporate about 75% of what Facebook does and most of what Twitter does using the e-mail protocol to create a distributed, encrypted social media system. Get on with it, someone."
How about you? You appear to have an idea for some of the parts I don't have one for, like how to make history public when you're using decentralized resources with no server to store the thing. Even Mastodon needs someone who wants to create a central server to store and connect to the rest of the network. I haven't put any thought into this as I don't use the existing sites and thus have little motivation to make something to do the same thing, but if you have, I'm sure there are some people who would work on it with you.