So basically they're applying different models based on data type.
Surely the idea of applying a programming model based on a data type is patentable. Not.
Researchers at Facebook parent's Meta have trained a single AI model capable of processing speech, images, and text in the hope that these so-called multi-modal systems will power the company’s augmented reality and metaverse products. The model, known as data2vec, can perform different tasks. Given an audio snippet, it can …
Don't neural networks require the input to be of a given shape (i.e. dimensionality) and size?
And wouldn't trying to train a general model be horrendously inefficient?
I wonder how much leccy zuckerborg has spaffed on this pointless exercise, at a time when we are struggling to heat our homes, and (presumably) California is struggling to generate enough power on its grid to run the air con in summer, the poor buggers.
"Given an audio snippet, it can recognize speech."
Depending on the definition of 'recognise', this capacity might be comparable to that of a two to three year old child.
" If it’s fed an image, it can classify objects."
Again, depending entirely on the definition of 'classify', a very small kid might be as competent, and numerous AI image classifiers already exist.
"when faced with text, it can check the grammar or analyse the writing’s tone and emotions"
Microsoft Word has been able to 'check grammar' (after a fashion) for years. I'd be interested to know the level of refinement with which 'tone' and 'emotions' are analyzed.
At first reading this sounded to me like Zuckerhype. However, the actual paper (however pretty incomprehensible) doesn't include either the word "emotion" or the word "tone", so maybe it's not entirely Metahype.
Re; tone and emotions.
I have toyed with two well-known sites that check one's writing: Grammarly and ProWritingAid. Both assess submitted texts on spelling, grammar, complexity, general readability, etc. They also judge the tone of the work: formal, semi-formal, casual, etc.
In a previous press release we said: "Meta trains data2vec neural network to understand speech, images, text so it can 'understand the world'"
That release should have said "Meta trains data2vec neural network to understand speech, images, text so it can 'take over the world'".
..we still continue to see news stories about real people having accounts hijacked and other shenanigans being unable to get a satisfactory response from the various Meta estates because they "don't have a human available and the automated systems can find no breach of community standards".
Clearly, Metas response isn't to have more humans, it's to try to improve their automated system and around we go again while they keep earning $billions and reneging on their responsibility and telling the legal systems of the world that they are "working on solutions"
Or remind you to heat the food...
"Zuckerberg wasn’t shy about sharing the food he had killed himself with friends and house guests. He once hosted Twitter CEO Jack Dorsey and treated him to goat he had killed. Dorsey said he remembers that the goat was served cold, so he stuck to salad for dinner."
<quote>Given an audio snippet, it can recognize speech. If it’s fed an image, it can classify objects. And when faced with text, it can check the grammar or analyse the writing’s tone and emotions.</quote>
While it can see objects, understand sounds and turn them into word and also read words can it discover the meaning of them or are they just a list of objects and a look up table for a dictionary?
It's clever idea but I think they've redefined what multi-modal means to avoid the difficult bit. 3 separately trained models that have outputs combined isn't multi-modal. Multi-modal is desirable because it's one of the hard problems left in classification neural networks.
Also, this was a red flag:
"We have not specifically analyzed how our models will react to adversarial examples"
Then Meta AI shouldn't be releasing news stories until you've properly tested it. That includes trying to break it to understand its bias and limitations.
Also, unsurprisingly, the original blog article and this story mentions nothing of ethics.