back to article Meta trains data2vec neural network to grok speech, images, text so it can 'understand the world'

Researchers at Facebook parent's Meta have trained a single AI model capable of processing speech, images, and text in the hope that these so-called multi-modal systems will power the company’s augmented reality and metaverse products. The model, known as data2vec, can perform different tasks. Given an audio snippet, it can …

  1. Anonymous Coward
    Anonymous Coward

    So basically they're applying different models based on data type.

    Surely the idea of applying a programming model based on a data type is patentable. Not.

    1. JimboSmith Silver badge

      If it’s fed an image, it can classify objects.

      So that’s capatchas buggered then is it? Prove you’re not a robot or Meta, select the images that show someone handing all their data to Facebook.

    2. teknopaul

      I believe they are using the same model for different data types and that's the new thing.

      Lots of what ifs and hype in that article, with just 80 days of audio it is surprising that the audio input helps at all.

      1. cyberdemon Silver badge
        Paris Hilton

        > I believe they are using the same model for different data types and that's the new thing.

        Don't neural networks require the input to be of a given shape (i.e. dimensionality) and size?

        And wouldn't trying to train a general model be horrendously inefficient?

        I wonder how much leccy zuckerborg has spaffed on this pointless exercise, at a time when we are struggling to heat our homes, and (presumably) California is struggling to generate enough power on its grid to run the air con in summer, the poor buggers.

  2. Il'Geller

    Finally! Finally Microsoft will get a worthy competitor.

  3. macjules

    "self-supervised learning"

    Would that apply to Facebook itself?

    I think not.

    1. Il'Geller

      Re: "self-supervised learning"

      Facebook is proactive, trying to get ahead of Microsoft. Which very soon it will just gobble up Facebook, starting its own social service based on text search.

  4. Winkypop Silver badge
    Terminator

    Meta future

    If you want dystopian, if that’s your thing, stick with the Zuckerborg.

  5. Anonymous Coward
    Anonymous Coward

    Given the input of an infinite number of Monkeys

    Will it also rewrite the Complete Works of William Shakespeare?

    1. Primus Secundus Tertius

      Re: Given the input of an infinite number of Monkeys

      Perhaps it will rewrite the Words of Zuckerberg to be even more obscure and evasive.

      1. John Brown (no body) Silver badge

        Re: Given the input of an infinite number of Monkeys

        Is that possible?

    2. Il'Geller

      Re: Given the input of an infinite number of Monkeys

      Yep, it’s possible.

  6. _LC_
    Headmaster

    I'm so glad

    I'm so glad that this is not aimed at us. For sure, this will only be used to better our life.

    There is no way that this will be used to “sniff out Khashoggis” or apply censorship. Never!

    1. Suburban Inmate

      Re: I'm so glad

      Pretty much already is. Not this specific AI (probably) but wastebook does already read and classify text in images.

      1. Il'Geller

        Re: I'm so glad

        Yes, this question-answering search into structured texts and images.

  7. Mike 137 Silver badge

    Wow!!!! But wait a moment...

    "Given an audio snippet, it can recognize speech."

    Depending on the definition of 'recognise', this capacity might be comparable to that of a two to three year old child.

    " If it’s fed an image, it can classify objects."

    Again, depending entirely on the definition of 'classify', a very small kid might be as competent, and numerous AI image classifiers already exist.

    "when faced with text, it can check the grammar or analyse the writing’s tone and emotions"

    Microsoft Word has been able to 'check grammar' (after a fashion) for years. I'd be interested to know the level of refinement with which 'tone' and 'emotions' are analyzed.

    At first reading this sounded to me like Zuckerhype. However, the actual paper (however pretty incomprehensible) doesn't include either the word "emotion" or the word "tone", so maybe it's not entirely Metahype.

    1. Primus Secundus Tertius

      Re: Wow!!!! But wait a moment...

      Re; tone and emotions.

      I have toyed with two well-known sites that check one's writing: Grammarly and ProWritingAid. Both assess submitted texts on spelling, grammar, complexity, general readability, etc. They also judge the tone of the work: formal, semi-formal, casual, etc.

  8. Magani
    Facepalm

    Dept of Corrections and Clarifications

    In a previous press release we said: "Meta trains data2vec neural network to understand speech, images, text so it can 'understand the world'"

    That release should have said "Meta trains data2vec neural network to understand speech, images, text so it can 'take over the world'".

    1. Primus Secundus Tertius

      Re: Dept of Corrections and Clarifications

      'take over the world'

      Amend that to 'take over the world and monetize it'.

  9. John Brown (no body) Silver badge

    And yet...

    ..we still continue to see news stories about real people having accounts hijacked and other shenanigans being unable to get a satisfactory response from the various Meta estates because they "don't have a human available and the automated systems can find no breach of community standards".

    Clearly, Metas response isn't to have more humans, it's to try to improve their automated system and around we go again while they keep earning $billions and reneging on their responsibility and telling the legal systems of the world that they are "working on solutions"

  10. DS999 Silver badge
    Facepalm

    So it is basically like a single program

    That has three completely independent procedures that are called based on the file type that is input.

    Wow, they've managed to make bring a neural network up to what was possible in regular programming in the 1960s!

  11. Warm Braw

    So it can 'understand the world'

    I assume that training Zuck was too much of a challenge.

    1. Pirate Dave Silver badge

      Re: So it can 'understand the world'

      Yeah, our top researchers still haven't perfected the human-to-space-alien translation software yet.

  12. Tromos

    "noticing if you miss an ingredient"

    I can usually tell, when I'm doing my egg and chips. Anything more sophisticated, I leave to professionals.

    1. Fruit and Nutcase Silver badge

      Re: "noticing if you miss an ingredient"

      Or remind you to heat the food...

      https://www.businessinsider.com.au/steve-jobs-elon-musk-bill-gates-mark-zuckerberg-diets-2019-3

      "Zuckerberg wasn’t shy about sharing the food he had killed himself with friends and house guests. He once hosted Twitter CEO Jack Dorsey and treated him to goat he had killed. Dorsey said he remembers that the goat was served cold, so he stuck to salad for dinner."

  13. Snowy Silver badge

    Yes nice but

    <quote>Given an audio snippet, it can recognize speech. If it’s fed an image, it can classify objects. And when faced with text, it can check the grammar or analyse the writing’s tone and emotions.</quote>

    While it can see objects, understand sounds and turn them into word and also read words can it discover the meaning of them or are they just a list of objects and a look up table for a dictionary?

  14. LionelB Silver badge

    Thanks for clearing that up, guys

    "It still, however, processes each form, whether its speech, images, and text, separately."

    So it's, erm, three neural networks, then.

    1. T. F. M. Reader

      Re: Thanks for clearing that up, guys

      It may be one neural network but three different training sets...

      1. LionelB Silver badge

        Re: Thanks for clearing that up, guys

        Well yes, maybe... I was equating "neural network" with "trained neural network" (on the grounds of an untrained neural network not being a functional thing at all).

  15. Ian Johnston Silver badge

    I am shocked, shocked to learn that computers can be designed to carry out more than one task. Why didn't I buy shares in ENIAC?

  16. T. F. M. Reader

    So that's what Meta-AI is good for then, eh?

    Making sure we put the correct amount of salt and pepper in our food?

    Seems to be a pretty low bar as far as aspirations go...

  17. RobLang

    Redefined what multi-modal means

    It's clever idea but I think they've redefined what multi-modal means to avoid the difficult bit. 3 separately trained models that have outputs combined isn't multi-modal. Multi-modal is desirable because it's one of the hard problems left in classification neural networks.

    Also, this was a red flag:

    "We have not specifically analyzed how our models will react to adversarial examples"

    Then Meta AI shouldn't be releasing news stories until you've properly tested it. That includes trying to break it to understand its bias and limitations.

    Also, unsurprisingly, the original blog article and this story mentions nothing of ethics.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like