back to article Custom superchippery pulls 3D from 2D images like humans

Computing brainboxes believe they have found a method which would allow robotic systems to perceive the 3D world around them by analysing 2D images as the human brain does - which would, among other things, allow the affordable development of cars able to drive themselves safely. For a normal computer, or even a normal …

COMMENTS

This topic is closed for new posts.
  1. Graham Bartlett
    Jobs Horns

    Extraordinary claims..

    ...demand extraordinary proof. Far too many false dawns on this from the AI fraternity. That said, if they've solved this then I'm massively impressed.

    1. Destroy All Monsters Silver badge
      Terminator

      What's there to solve?

      This is like "solving" the problem of putting 250 hyped horses into an engine block. Bound to be done eventually. "False dawns" only occur if someone gets physics envy and claims to have solved intelligent behaviour in one bold stroke. This only impresses philosophers and the Common Man.

    2. Filippo Silver badge

      not too implausible

      I'm not impressed by the FPS count - they're doing a neural network in hardware instead of doing it in software, a performance increase of several orders of magnitude is to be expected. Neural networks are massively inefficient to emulate.

      Avoiding obstacle in the real world, however, even discounting the computing performance required, is still an unsolved task. It would be interesting to see them do that automated car race next time...

    3. John Smith 19 Gold badge
      Joke

      @Graham Bartlett

      "That said, if they've solved this then I'm massively impressed."

      But will you be massively impressed in parallel?

  2. Anonymous Coward
    Welcome

    3D camera?!

    forgive my ignorance on the subject.

    I understand that the problem we were facing before is the fact that the AI can't see in 3D the way we do (plus it have a big problem recognizing objects, especially if another object cover a part of the target object). But shouldn't the problem, with the inability to see in 3D, be fixed because of the introduction of 3D cameras? By having 2 images, the AI now can see depth the same way we do.

    So why aren't 3D cameras being used for the tests yet? Why are the tests still being conducted on 2D cameras? Is just because too much time have been spent on the 2D cameras and the 3D camera is viewed as the next stage? or it is not possible to do the test with 3D camera?

    1. Anonymous Coward
      Anonymous Coward

      It is working on the way we perceive 3D objects.

      With ojects beyond a certain distance (around 3 - 4m) your binocular vision isn't accurate enough to determine their position in 3D, so you fall back on the sort of guesswork that this system is trying to do from the 2D imagery.

      1. Anonymous Coward
        Happy

        @Joefish, 16th September 2010 10:57 GMT

        thanks for the explaining, never thought about it that way

      2. Peter H. Coffin

        The difference being...

        That a self-driving car could easily have other sense than just binocular vision to play with, such as laser range-finding and RADAR. Further, the binocular vision can have a MUCH larger inter-axial distance than the Mk I eyeballs do. We get 62mm more or less, while a self driving car could easily have four "eyes", with clusters of two checking close clearances, and the binocular image between the two clusters serving as a much longer base for measuring parallax.

        The real issue is that all of those solutions involve making it easy with hardware, and that means buying actual bits of things, which costs money. For cars, where one would expect there to be millions of the things, anything you can do in software to save buying hardware returns the savings milllions-fold. This isn't about doing these things *at all*, its about doing them for 50% less hardware and saving pots of money on the manufacturing side.

        But that sounds much less impressive headlines: "Boffins save car manufacturers lots of money in the future" doesn't have the same punch.

    2. Matt_V

      I *think*

      the reason is that it is still our brains that turn the image into 3D (3D tv's are still flat (ie 2D)) but the way the 3D camera works allows our brains to work out that the picture is 3D.

      This is how I understand it, I could be spectacularly wrong however! (I'm sure someone will tell me if I am!)

      1. Nick Ryan Silver badge

        Re: I *think*

        You're correct in this. Each eye can only record 2D images, the stereoscopic sight comes into action, as noted above by a previous poster, at only short distances. Generally within arms reach is considered to be the limit of really accurate 3D positioning - our eyes aren't far enough apart given the resolution to cope with much further.

        It's rather more complicated that that of course, our eyes even when appearing still are always flicking slightly back and forth, this generates a little 3D information and a lot of edge information. The brain makes assumptions and remembers the details of objects, which is why objects appear to be in the correct colour even at periphary vision which is mostly light / movement based and not colour.

        A brain is a massively parallel, very effective pattern matching computer. A conventional computer is procedural - there's a lot of difference :)

        1. Wize

          If our stereo vision fails as a few feet...

          ...does that make the 3d films unrealistic?

          1. Charles 9

            They already are unrealistic.

            And that's part of the reason many people get headaches when watching "stereoscopic" movies (perhaps the more proper term for them). Stereoscopic vision is just part of the system that allows us to perceive in three dimensions, but by abusing the system and not allowing for the rest of the works to, um, work (for example, you can't adjust the focus of the scene like you can in real life), the brain begins to go spare and we start to get the initial traces of "simulation sickness".

      2. Loyal Commenter Silver badge
        Boffin

        @Matt_V

        I think the reason that this is considered to be a 'hard' problem in AI terms is that the way we perceive 3D is actually pretty complicated. This is why the visual cortex makes up a significant part of the human brain, compared to things like the olfactory bulb, used for smell and taste, or the primary auditory cortex, which processes hearing. IANANS (I am not a neuroscientist!), so I stand to be corrected here.

        As far as I am aware, we determine the position and size of objects in our visual field by a number of methods - we have binocular vision which allows us to use parallax, although as mentioned by someone above, this is only really effective at close range, where the distance to the object is a small multiple of the distance between our eyes.

        We also use things like focal distance to determine how far objects further from us are, and to determine, for instance, whether something is small and nearby, or large and distant (http://www.facebook.com/pages/These-are-small-but-the-ones-out-there-are-far-away/246777073630).

        For moving objects, we can tell whether they are moving towards us or away from us by changes in size.

        We can also work out the shape and size of objects that are not square-on to us, by using perspective.

        In addition to this, our brains fill in 'missing' information. The classic examples of this are related to the blind spot (http://en.wikipedia.org/wiki/Filling-in) and reification (http://en.wikipedia.org/wiki/Gestalt_psychology#Reification).

        Because we use several methods for determining the distance, size and shape of objects, this allows scope for our minds to be tricked by giving them conflicting or incomplete information.

        So, good luck to anyone who tries to get a computer to 'see' in the same way we do, without first solving the 'hard' problem of AI. In fact, on second thoughts, it may actually be part of the 'hard' problem.

    3. Cameron Colley

      3D cameras do not create 3d images.

      3D cameras create 2 2D images, these are then displayed one after the other to alternate eyes -- that makes humans think that they are seeing depth. The only things that currently create 3D images are the laser-scanner type rigs and the like.

  3. Brangdon

    3D movies by post-processing

    Will this technology make it economical to release 3D versions of classic 2D movies such as Bladerunner? If so, we should burn it now. Burn it with fire.

  4. Loyal Commenter Silver badge
    Terminator

    Soylent Green?

    "But yesterday [they] presented a new system dubbed "Neuflow". This uses custom hardware modelled on the brain's visual-processing centres, all built on a single chip..."

    It's not made from people is it?

  5. Anonymous Coward
    Black Helicopters

    Sooo...

    If it emulates visual cortex, then surely it will also be susceptible to mirages and any other optical illusions? Meaning, that it won't be any safer than an experienced driver (who isn't too tired / distracted). This also means we'll have a very efficient CCTV face recognition system (as this type of network is really good at patter recognition) - something akin to ANPR, but for just walking on the street. Sounds great.

  6. Ginolard

    Belgium should be the guinea pig

    I would heartily welcome computer-controlled cars here. Hell, hook up ZX81 up to each vehicle and it would probably do a better job than the majority of clowns that get behind a wheel in this country.

  7. Richard Joseph
    Boffin

    Was thinking this would be the way to go...

    ...about 2-3 years back, maybe using a combination a visuals and sonar/ultrasound, like bats.

    For a while I was fascinated by Carnegie Mellon University's research into 2D image to 3D (model?) research and Microsoft's PhotoSynth picture recognition platform (http://www.http://photosynth.net).

    If I remember rightly, the CMU 'pop-up' project (www.cs.cmu.edu/~dhoiem/projects/popup/index.html), and related topics, rely on Fast Fourier Transform-like (FFT) maths and texture gradient analysis (think 'detail density is higher further away'). I'm no Mathematician or Scientist, so do you're own digging if you like.

    Always been a 'Dick Tracy and flying cars' kind of guy myself, just waiting to see if I'll see self-driving cars at any point in my life time :-]

  8. Tom Chiverton 1

    Controlling a car needs a super computer does it ?

    You'd better tell DARPA, because their last automated car challenge showed you only need a few laptops :-)

    http://www.darpa.mil/grandchallenge/index.asp

    1. jonathanb Silver badge

      re: Controlling a car needs a super computer does it ?

      Controlling a car with only two cameras spaced about 10cm apart as your input device requires a supercomputer.

  9. Neil Barnes Silver badge

    @Brangdon - 3d post processing

    Bad news I'm afraid - it's already being done. Even just using a depth cue based on colour works surprisingly well... the biggest problem with 3-d tv is that people still get sick watching it, because the displays are all universally bodges and because they don't deliver *all* the cues that your brain expects.

  10. t_lark

    Oversold

    I think this article is over hyped.

    Firstly from the speed up perspective. From the video: x60 than an Intel i7 and most importantly x2 a GPU. So I can replicate the functionality with two GPU's with sli then. Hardly replacing a megaton supercomputer.

    Secondly, the actual science is a little dubious. One reason why the brain does well because it has estimates of how big everything should be, but this chip is just a convolution farm without any shape priors. So at a stretch it mimics the very first neural layer, but not the visual *system*.

    Excellent use of an FPGA though, I like the tech, but not some of the claims. To get better 3D from a moving camera the more accurate way would be to use the separate frames as views from different perspectives (and generalizations thereof). Doing that in real time would be a challenge.

  11. Bob Merkin
    Terminator

    RotM

    "or permitting smaller robots to get about inside buildings or other cluttered environments"

    Which would of course be the best place for a shiny automaton to train its plasma rifle on the IR signature of a cowering humanoid.

  12. gimbal
    Coat

    Depth perception doesn't end at the visual input...

    Respectfully, I'm sure that the person they interviewed for this story must've been flattered for the attention. That, in itself, could be a warning sign, as far as the credulity of the presumed science involved - but, I digress.

    Reading a couple of the comments, here, it at least adds an interesting point of view about how we perceive what we see. The suggestion that retinal images are two-dimensional, that sounds spot on. I don't believe it really makes it "3D" when the two "2D" images are combined, though - it's just that we perceive depth, as a result of (presumably) how the brain processes the subtle differences between the two visual fields, that meeting the the left and the right retinas, respectively.

    I wonder if they've really understood depth perception, enough to emulate it with a computer system - if that would be their approach? How else, then, might they try to emulate depth perception and motion perception? Presumably, one would need to have mastered the emulation of both manners of perception, with silicon kit - and probably a host of some other things, which don't come into the picture, here, but would work out to be necessary, there - in order to perform such a task as to drive a car, successfully, down a highway, with a computer at the wheel.

    But hey, who am I to stand in the way of a perhaps-naive dream, as such?...

    Mine's the coat with nothing science-fiction related in its pocket. Cheers.

  13. Anonymous Coward
    Terminator

    200 tonnes ?

    She wishes! , closer to 230 than 200.

  14. Yet Another Anonymous coward Silver badge

    @3D cameras do not create 3d images

    You can create exactly the same 3D model from a stereo camera as a laser scanner - all you need is to find the same point in both images and know the camera separation.

    This is going to be fun with the fake street scenes painted on the sides of buildings here.

    I also expect a lot of cars to drive into bus shelters that have car ads featuring pictures of the open road on them!

  15. Anonymous Coward
    Anonymous Coward

    Nueral nets

    I've used software neural nets on several occasions in the creation of accurate forecasting models. They ranged from a few dozen neurons in the neural net to thousands in my epidemiological forecasting model. I think some of the posters are failing to realize that you must train the neural net for the application that are interested in. Every time that one of the net models I created generated a forecast the first thing it would do was reach back all the way to the first data and sequentially incorporated the rest of the data to generate the non-linear weighting so that it could project the next 12 months. It was an extremely processor intensive but given the extremely accurate projections it was worth it. I would be really be interesting, and I believe I try it again, using CUDA and/or Tesla and see how it falls out today.

    After reading the other posts, I observed that many people are ignoring the training requirement for any neural net. For instance, allowing it to develop expectations based on past experience as to the size of objects and other correlations. What I was doing in the past was exactly that.

  16. Tom 7

    I look forward to a claimsRus payout

    when I am emotionally scarred for life by close contact with a self drive car that fails to calculate my true position due to a simple optical illusion on my t-shirt.

  17. Graham Bartlett

    @Destroy All Monsters

    Not quite. A thousand years ago, would anyone have said "oh sure, getting the power of 200 horses into something the size of a cart is just a matter of time"? Of course they wouldn't - they simply didn't know the technology was possible. Once the problem is understood well enough to make it an engineering issue, then of course it's just a matter of time. But until (a) the problem is understood well enough, and (b) your engineering is sufficiently advanced, it's not at all obvious that it's possible. A ladder to the Moon is an engineering problem, but it doesn't mean it's likely to happen. (Even a ladder to geostationary orbit, AKA a space elevator, has only been even theoretically possible with the discovery of carbon nanotubes.)

    Without full stereoscopy, depth perception relies partly on comparison with known-size objects so that distance can be estimated based on how big they look, and partly on recognising individual objects from only seeing part of the object when multiple objects occlude each other. This can be done statically, but a major part of how humans do it is by watching the scene change and recognising objects by the fact that their various component parts move together.

    So far as I'm aware, getting software to do any of this even slightly reliably is currently the bleeding-edge in vision research. It certainly couldn't run in anything like real-time, even with lots of ultra-fast PCs doing the number-crunching, and it certainly hasn't done a fraction of the tricks that the human brain uses for depth perception and object recognition. Hell, even figuring out *what* tricks the human brain uses is still a matter of serious research. So this isn't just an engineering issue - it is (or was, if these guys have cracked it) still a matter of figuring out what needs to be done.

    So if these guys have managed to do it, they've produced a fully-working answer to something where everyone else is still trying to figure out what the question should be. And yes, a large part of that answer *IS* intelligent behaviour. That's a big deal.

  18. xperroni
    Boffin

    Close, but no cookie

    I took a peek at the researchers' website. <snob>As a MSc in Computer Intelligence with interest in Artificial Vision,</snob> I think they have some really nice ideas going, and some of them may well come together into a very good autonomous system eventually – but right now they're nowhere near delivery. Actually that's the problem with most AI research: everyone has a wonderful proof-of-concept to show, but actual products – stuff you can take to the field and solve real problems with – are few and far between.

    Much of it, I believe, has to do with differing standards of success between academia and industry. Whereas a commercial project, to be considered successful, must produce something that can be readily sold (ideally by the hundred thousands), a research project can "succeed" by providing a "solution" that leaves out practical usability concerns, by "solving" only part of a large problem, or even by posing interesting new questions about the subject, without actually solving anything.

    I once worked for a small ISV that attempted to create a commercial product out of academic research in Artificial Vision. At first we thought we'd only write some user-friendly GUI interfaces around the research system; however, as soon as customer requirements came into play, we realized we only got one piece of a pretty big puzzle, whose overall complexity the original research had conveniently chosen not to deal with. Trying to complement what we got with results from other research projects brought us much of the same – stuff that was great under carefully controlled test conditions, but couldn't on its own live up to the harshness of production environments.

    Over time it became clear that far from the usual contractor projects we were used to do, we'd first have to invest real money and real time – years, probably – to turn that pile of academic papers into something approaching a real solution, and only then hope to attract any customers. The project was subsequently put on hold, pending the granting of government research funds; soon after I left the company for a job in another city. Last I heard of them, they did got the funds, but still weren't nowhere close to delivering an actual product.

    Mind you, I'm not saying that academic research isn't fruitful. We owe academia a lot of useful stuff, from RISC machines to the search algorithms used by Google and Bing, and much more is yet to come out of it. However, researchers alone can rarely pull it off; it's the industry players who most often make the bridge from promising research to usable products. Research announcements are useful to get the gist of where academia is headed, but unless they're backed by a business partner, it's unlikely they're going to bring something to the market anytime soon.

This topic is closed for new posts.

Other stories you might like