Hold DeepFake Competition, Get The Source code, Profit !
Did all the entrants have to assign their IPR for their ideas to Facebook as part of the Rules as well ?
Five engineers missed out on sharing a top prize of $500,000 in a Facebook-backed AI competition – after they were disqualified for using images scraped from Flickr and YouTube to train their deepfake-detecting system. The contest, dubbed the Deepfake Detection Challenge, was hosted on Kaggle, a Google-owned platform popular …
A machine learning algorithm is largely the result of its training data. Take away the training data and there isn't much left. As such, you might make the argument it actually is a derivative work of that training data. If anyone makes that stick in court there could be some really interesting consequences :)
The *architecture* of the network is left - e.g. if an artificial neural network, the number of layers, the number of nodes in those layers and the sequencing of them, how the training converges, how the nodes are linked, the thresholding function, and so on. I will leave it up to the reader to decide whether it is this or the training data that gives the basis for good results; perhaps you might imagine a simple truth table with good/bad training data on one axis, and good/bad architecture on the other, and imagine what the results would be.
The thing is, the winners used only commercially available datasets (with CC-BY commerical licenses). Facebook then tried to find another rule to disqualify them, because they didn't want to endorse a winning solution which used anything from YouTube.
Would have been fair, if they said it during the competition. Many people asked about youtube data and weren't told no.
I'm real conflicted on this one. On the one hand, I really feel for this team; Facebook has never given a crap about privacy so who would expect a clause like this? And, furthermore, the NVidia data set they used is obviously for neural network training (NVidia is not going to collect a data set like this just for the hell of it) so one would assume it was OK.
On the other hand -- I do agree with the goal of explicit consent. I would not mind being involved with a system used for detecting deep fakes, but facial recognition neural networks are a tool of the modern police state and I would NEVER consent to my face being used for that! (That said, I don't throw my photos on Flickr or Youtube, and DEFINITELY not on Facebook!)
"The *architecture* of the network is left - e.g. if an artificial neural network, the number of layers, the number of nodes in those layers and the sequencing of them, how the training converges, how the nodes are linked, the thresholding function, and so on."
But even then, in some cases the neural network is trained, then one with somewhat different number of nodes, layer, connectivity, etc., is trained, the another with different parameters; in other words, besides being TRAINED off the image data, the size and shape of the network is actually determined off the data set. This is a relatively recent technique, it would have taken way the hell too long in the past but you know, Moore's law and all that (plus NVidia shipping out these cards with like 5000 CUDA cores on them) has helped with that. Sometimes this technique works great, sometimes effectively it's overfitting the data, the neural network works great trained with THAT data set but not another one.
"I'm real conflicted on this one."
I'm not, because we ought to assume that Facebook will be more than happy to use the winning code...training it with data pilfered from their own service, not from elsewhere.
The excuse given, especially coming from Facebook, is just a way of stiffing the winners.
Have I understood this right? You can use external sets, but you need to get everyone's individual permissions before you can use it. But additionally, you can only use sets that other people can also use, which means if they wanted to use the same set, they would also need to get all the permissions. Which means technically all external sets aren't allowed. Because once you have all the permissions, that means you're set is different to the public available set, because the publically available at doesn't have all the permissions.
That's a brilliant Catch 22! You almost have to applaud the pure chicanery!
Same training, same testing, everyone is on a level playing field. If one team can afford to get photos of a million people with consent and another could afford only a thousand, why should the one with more resources potentially win only because it had better training?
Or alternatively, require all contestants to submit their training data (which they MUST have used, no giving data designed to sabotage others) and other entrants are free to use it if they want.
Also these are training sets, nothing to stop them asking them to retrain on a public training set and redo the competition. That would be more fair. Completely kicking them out, when if the above comments are true, they were asking and being open to FB, is very unfair an suspicious.
[Edit]Ah, reading below they did. So that seems more fair, and I don't think FB is in the wrong here. An unfortunate result, but a level playing field. The other dataset results are still an option for other uses, just not the competition.
Not necessarily, one team might be able to use training data from grainy cctv while another team needs pristine mug shots.
If you want the broadest set of research then you need to make it fairly open, otherwise you get a Formula1 type result where people are merely micro-optmising for the rules
"one team might be able to use training data from grainy cctv while another team needs pristine mug shots"
In that case, mugshot team is inferior. Grainy rubbish shots with poor lighting and wonky colour balance are normal. Failing to handle those is failing.
"where people are merely micro-optmising for the rules"
Not really. There's an easy way around that. Give everybody the same test data. Then make it very clear that when determining the effectiveness of the solution, it will be evaluated with a different set of test data that is not disclosed beforehand, using the data learned from the first data set.
That way, the teams will have incentive to make something that can actually recognise, rather than performing very well with specific predetermined images.
take away the 1st place teams extra data and they end up 7th - well this proves they had a worse algorithm than the new first place guy. level the playing field - equal training data - and they fall behind. he had smarter augmentation techniques.
story quotes the disqualified score is 0.423 - it seems Selim Seferbekov score is 0.42798. we are talking about 42.3% vs 42.8%. so there is the typo, fraction of a percentage!
worst thing is - facebook say no teams used the promising approach of sensor noise forensics . that is the danger of crowdsourcing - groupthink. teams just running same old basic code over and over, scraping extra data to try and get an edge that way. not really innovative.
is this the same site where a guy scraped the actual answers for a contest and won, banked the cash, then 8 months later got rumbled by a volunteer doing the actual deployment work for free?
google owns this site? they try to foist a cheating solution on to a charity that had to pay them to run the contest. the volunteer that found the hidden scraped answers was a runner up in the competition.
didn't this disqualified team read theregister - why invest time into a site like that after reading a story like that? forewarned is forearmed?
yurhur. that's what facebook called it. they were so hoping for something based on that. a hard programming task with lots of time and dedicated human intelligent focus required.
what they got is another circus of nvidia employees showing off their banks of computers filled with banks of high end GPUs running the same neural network models - trying to out compute the next guy doing the same thing with less resources.
most of the top ranked competitors there seem to be nvidia employees.the disqualified team had one on it. now they spin the story to get something out of it. plucky underdog team a victim of a sneaky rule change. no.
it's kardashianisation - how do you advertise in a way that gets through adblockers? flood a platform with INFLUENCERS
nvidia = kardashians
But us because we've now seen your code and although we won't, obviously, use it code for code, you just know we're gonna rip off the way you went about using your code. You've given us some great ideas for free. Oh...whats that... you'll sue will you? Have you seen our bank account. Its never ending so we'll just make the suit never ending till you're bankrupt.
Are we cunts? Yes. Do we care? No.
Biting the hand that feeds IT © 1998–2020