back to article Do you Word2Vec? Google's neural-network bookworm

Several years back, the Google "Brain Team" that was behind Tensorflow hatched another novel neural tool: Word2Vec. Word2Vec is a two-layer neural net for processing text. It swallows a given set of text that it then returns as a set of vectors – turning the words into a numerical form that computers can understand. Word2Vec …

  1. colinb

    Great

    Good stuff.

    Word2Vec had been on my radar for a while but not had a chance to play with it, will give this a spin at the weekend.

    I think the best way to handle semantic text would be to parse into a graph representation combined with vectors for efficiency, something i'm playing around with.

    We're still some way from Beatles+Yoko=Breakup tho (and yes there are other interpretations).

  2. Anonymous South African Coward

    Very interesting read. And fascinating too.

    And no, I'm not Spock. :)

  3. TRT Silver badge

    Artificial intelligence let loose on the internet can go badly wrong

    You can't train an artificial intelligence using real stupidity. ;)

    1. Charlie Clark Silver badge
      Black Helicopters

      Re: Artificial intelligence let loose on the internet can go badly wrong

      I don't think this would really suitable for a chatbot which is probably why we've not really seen one from Google. I can think of a number of uses it could be put to but if I told you I'd have to kill you....

  4. Anonymous Coward
    Anonymous Coward

    Free Speech

    Neat technology, but with the right training (Newspeak) isn't it exactly the sort of tool that our governments would like deployed by the internet's big players to censor what people can post online?

  5. Primus Secundus Tertius Silver badge

    Vector spaces

    These 'vectors' remind me of Hilbert space: infinite-dimensioned, with many imaginary components.

  6. John Smith 19 Gold badge
    Unhappy

    Reminded me of the MIT

    START Natural language question answering system found here

  7. Mike007

    Imagine a baby. Before it opens its eyes for the first time it is connected to some machine that can completely control sensory input, similar to The Matrix.

    Instead of feeding in images of a physical world, the only input it gets is plain white text displayed in front of its "eyes" on a black background. This is the only "external input" it will have for its entire life. It can respond and interact somehow through muscle actions, no need to get hung up on the details.

    Words represent social concepts/ideas. How does it understand "Christmas"?

    There is a major difference between being able to calculate the statistically correct reply to a question and being able to understand a question.

    We might already have all of the "technology" required for sentient AI, it just needs quite a bit more processing power, which will come in time. The problem is, our way of "teaching AIs" means that we are potentially training the next-in-line-for-the-throne to interact with the world without any understanding of it.

    Imagine the difference between an entity in charge of all "law enforcement drones" learning "conflict resolution" from Facebook posts vs court rulings vs Life. You might argue that court rulings are a better training set than interacting with the typical person, but how do you understand the basis for some ruling on torture if you've never felt pain?

    1. Anonymous Coward
      Anonymous Coward

      If you were trying to argue for embodied cognition, I might be able to agree with you. But you're really just complaining about the format of the data.

      Think about it, how does a human understand "Christmas"? Internally, it's a bunch of neuronal signals that get transmitted around. Does a single neuron "understand" Christmas? No, but a collection of them will be related to Christmas. The neuron mostly closely associated with "Christmas" certainly doesn't process the raw data about lights and trees and religious ceremonies, so it must work with some type of representation of "Christmas". Is the format of that data important? Perhaps, but whether it's words or electro-chemical signals, it's still a reduction of all the elements of "Christmas".

      Making sure there's enough data to completely describe the concept is hard using natural language, but not impossible. Certainly that's much easier with a diverse range of datatypes, but we can already convert complex data into binary, so it's not impossible.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like