back to article Facebook's $500k deepfake-detector AI contest drama: Winning team disqualified on buried consent technicality

Five engineers missed out on sharing a top prize of $500,000 in a Facebook-backed AI competition – after they were disqualified for using images scraped from Flickr and YouTube to train their deepfake-detecting system. The contest, dubbed the Deepfake Detection Challenge, was hosted on Kaggle, a Google-owned platform popular …

  1. Anonymous Coward
    Anonymous Coward

    Hold DeepFake Competition, Get The Source code, Profit !

    Did all the entrants have to assign their IPR for their ideas to Facebook as part of the Rules as well ?

    1. Pascal Monett Silver badge
      Flame

      Re: Hold DeepFake Competition, Get The Source code, Profit !

      Indeed, Facebook pulled the rug from under the real winners on a technicality, but it won't be giving the code back, now will it ?

      1. LucasNorth

        Re: Hold DeepFake Competition, Get The Source code, Profit !

        a "technicality", by that I assume you mean the rules that they didnt read

  2. Nifty Bronze badge

    In other news

    Winner banned from Casino

    1. Sanctimonious Prick
      Coat

      Re: In other news

      And all winnings over the past 90 days are invalid, and must be returned within 7 days.

  3. AVee

    Makes me wonder...

    A machine learning algorithm is largely the result of its training data. Take away the training data and there isn't much left. As such, you might make the argument it actually is a derivative work of that training data. If anyone makes that stick in court there could be some really interesting consequences :)

    1. Paul Kinsler

      Re: Take away the training data and there isn't much left.

      The *architecture* of the network is left - e.g. if an artificial neural network, the number of layers, the number of nodes in those layers and the sequencing of them, how the training converges, how the nodes are linked, the thresholding function, and so on. I will leave it up to the reader to decide whether it is this or the training data that gives the basis for good results; perhaps you might imagine a simple truth table with good/bad training data on one axis, and good/bad architecture on the other, and imagine what the results would be.

  4. Steve Graham

    Follow the money

    The rules forbade using any data set which could not be used commercially. That would not be in the interests of Facebook.

    1. Anonymous Coward
      Anonymous Coward

      Re: Follow the money

      The thing is, the winners used only commercially available datasets (with CC-BY commerical licenses). Facebook then tried to find another rule to disqualify them, because they didn't want to endorse a winning solution which used anything from YouTube.

      Would have been fair, if they said it during the competition. Many people asked about youtube data and weren't told no.

      1. LucasNorth

        Re: Follow the money

        any evidence for that claim or are you a mind reader?

  5. Fazal Majid

    Like all bureaucracies, Facebook must have parts that are super-anal about privacy and compliance, and others that couldn't care less. Most likely R&D that sponsored this contest fall within the former, whereas revenue-generating bits fall within the latter.

    1. iron Silver badge

      > Like all bureaucracies, Facebook must have parts that are super-anal about privacy

      Hahahahahahahahahahahahahahahahahahaha. Good one.

      1. MiguelC Silver badge

        Well, they do, if it concerns Zuck's own privacy

        1. KBeee

          Yeah, YOUR privacy is of no concern to him

          https://www.youtube.com/watch?v=fWJ-S7iLd8M

    2. Woza
      Joke

      > Facebook must have parts that are super-anal about privacy

      I have detected a fake Facebook! Where's my $500k?

  6. IGotOut Silver badge

    tldr

    You used that Google had harvested, rather than ones Facebook had harvested.

    Gotcha.

  7. Henry Wertz 1 Gold badge

    Conflicted

    I'm real conflicted on this one. On the one hand, I really feel for this team; Facebook has never given a crap about privacy so who would expect a clause like this? And, furthermore, the NVidia data set they used is obviously for neural network training (NVidia is not going to collect a data set like this just for the hell of it) so one would assume it was OK.

    On the other hand -- I do agree with the goal of explicit consent. I would not mind being involved with a system used for detecting deep fakes, but facial recognition neural networks are a tool of the modern police state and I would NEVER consent to my face being used for that! (That said, I don't throw my photos on Flickr or Youtube, and DEFINITELY not on Facebook!)

    "The *architecture* of the network is left - e.g. if an artificial neural network, the number of layers, the number of nodes in those layers and the sequencing of them, how the training converges, how the nodes are linked, the thresholding function, and so on."

    But even then, in some cases the neural network is trained, then one with somewhat different number of nodes, layer, connectivity, etc., is trained, the another with different parameters; in other words, besides being TRAINED off the image data, the size and shape of the network is actually determined off the data set. This is a relatively recent technique, it would have taken way the hell too long in the past but you know, Moore's law and all that (plus NVidia shipping out these cards with like 5000 CUDA cores on them) has helped with that. Sometimes this technique works great, sometimes effectively it's overfitting the data, the neural network works great trained with THAT data set but not another one.

    1. heyrick Silver badge

      Re: Conflicted

      "I'm real conflicted on this one."

      I'm not, because we ought to assume that Facebook will be more than happy to use the winning code...training it with data pilfered from their own service, not from elsewhere.

      The excuse given, especially coming from Facebook, is just a way of stiffing the winners.

  8. lglethal Silver badge
    Go

    Wait a minute...

    Have I understood this right? You can use external sets, but you need to get everyone's individual permissions before you can use it. But additionally, you can only use sets that other people can also use, which means if they wanted to use the same set, they would also need to get all the permissions. Which means technically all external sets aren't allowed. Because once you have all the permissions, that means you're set is different to the public available set, because the publically available at doesn't have all the permissions.

    That's a brilliant Catch 22! You almost have to applaud the pure chicanery!

    1. logicalextreme Bronze badge

      Re: Wait a minute...

      Yeah, reading the article I was trying to figure out who would be eligible to win.

  9. DS999

    Why didn't they supply everyone with the same training data?

    Same training, same testing, everyone is on a level playing field. If one team can afford to get photos of a million people with consent and another could afford only a thousand, why should the one with more resources potentially win only because it had better training?

    Or alternatively, require all contestants to submit their training data (which they MUST have used, no giving data designed to sabotage others) and other entrants are free to use it if they want.

    1. TechnicalBen Silver badge

      Re: Why didn't they supply everyone with the same training data?

      Also these are training sets, nothing to stop them asking them to retrain on a public training set and redo the competition. That would be more fair. Completely kicking them out, when if the above comments are true, they were asking and being open to FB, is very unfair an suspicious.

      [Edit]Ah, reading below they did. So that seems more fair, and I don't think FB is in the wrong here. An unfortunate result, but a level playing field. The other dataset results are still an option for other uses, just not the competition.

  10. Mark192 Bronze badge

    If the contest was for designing the AI...

    If the contest was for designing the AI, it would have made more sense to limit everyone to the same datasets.

    Still, it's a great demonstration of the importance of good datasets... first down to seventh.. ouch. Gotta feel for that team :-/

    1. Yet Another Anonymous coward Silver badge

      Re: If the contest was for designing the AI...

      Not necessarily, one team might be able to use training data from grainy cctv while another team needs pristine mug shots.

      If you want the broadest set of research then you need to make it fairly open, otherwise you get a Formula1 type result where people are merely micro-optmising for the rules

      1. heyrick Silver badge

        Re: If the contest was for designing the AI...

        "one team might be able to use training data from grainy cctv while another team needs pristine mug shots"

        In that case, mugshot team is inferior. Grainy rubbish shots with poor lighting and wonky colour balance are normal. Failing to handle those is failing.

        "where people are merely micro-optmising for the rules"

        Not really. There's an easy way around that. Give everybody the same test data. Then make it very clear that when determining the effectiveness of the solution, it will be evaluated with a different set of test data that is not disclosed beforehand, using the data learned from the first data set.

        That way, the teams will have incentive to make something that can actually recognise, rather than performing very well with specific predetermined images.

  11. Anonymous Coward
    Anonymous Coward

    take away the 1st place teams extra data and they end up 7th - well this proves they had a worse algorithm than the new first place guy. level the playing field - equal training data - and they fall behind. he had smarter augmentation techniques.

    story quotes the disqualified score is 0.423 - it seems Selim Seferbekov score is 0.42798. we are talking about 42.3% vs 42.8%. so there is the typo, fraction of a percentage!

    worst thing is - facebook say no teams used the promising approach of sensor noise forensics . that is the danger of crowdsourcing - groupthink. teams just running same old basic code over and over, scraping extra data to try and get an edge that way. not really innovative.

    is this the same site where a guy scraped the actual answers for a contest and won, banked the cash, then 8 months later got rumbled by a volunteer doing the actual deployment work for free?

    https://www.theregister.com/2020/01/21/ai_kaggle_contest_cheat/

    google owns this site? they try to foist a cheating solution on to a charity that had to pay them to run the contest. the volunteer that found the hidden scraped answers was a runner up in the competition.

    didn't this disqualified team read theregister - why invest time into a site like that after reading a story like that? forewarned is forearmed?

    1. DryBones

      Sensor noise forensics? You mean, how changes don't generally match the background patterns? The same thing that makes Photoshops obvious? It needs a fancy name?

      1. Anonymous Coward
        Anonymous Coward

        yurhur. that's what facebook called it. they were so hoping for something based on that. a hard programming task with lots of time and dedicated human intelligent focus required.

        what they got is another circus of nvidia employees showing off their banks of computers filled with banks of high end GPUs running the same neural network models - trying to out compute the next guy doing the same thing with less resources.

        most of the top ranked competitors there seem to be nvidia employees.the disqualified team had one on it. now they spin the story to get something out of it. plucky underdog team a victim of a sneaky rule change. no.

        it's kardashianisation - how do you advertise in a way that gets through adblockers? flood a platform with INFLUENCERS

        nvidia = kardashians

  12. KBeee
    Facepalm

    It's that smell again

    Have you ever got home from a walk, and when you get indoors you notice a nasty smell and think

    "Oh God what have I stepped in?"

    I get that every time I see an article about Facebook.

  13. quartzz

    "sorry, we can't pay you". funny that...

  14. steviebuk Silver badge

    None of you will now win

    But us because we've now seen your code and although we won't, obviously, use it code for code, you just know we're gonna rip off the way you went about using your code. You've given us some great ideas for free. Oh...whats that... you'll sue will you? Have you seen our bank account. Its never ending so we'll just make the suit never ending till you're bankrupt.

    Are we cunts? Yes. Do we care? No.

  15. Anonymous Coward
    Anonymous Coward

    Some new kind of rounding there?

    "The All Faces Are Real team scored 0.42320 while Seferbekov scored 0.42798, which rounds up to 0.423."

    Is that some new kind of rounding you're doing there? Do you mean 0.42798 rounds up to 0.428, perhaps?

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2020