back to article Machine learning research in acoustics could open up multimodal metaverse

Researchers at MIT and the IBM Watson AI Lab have created a machine learning model to predict what a listener would hear in a variety of locations within a 3D space. The researchers first used the ML model to understand how any sound in a room will propagate through the space, building up a picture of a 3D room in the same way …

  1. cosmodrome

    Bose have developed such a system in the 1990s IIRC. Without all the ML rings and bells but compatible with CAAD systems. You'll put a drawing of your building or space in, specify the materials used and you'll get the acoustic model out of it. It used to just work. I don't see why you'd need atificial intelligence for this.

    1. Filippo Silver badge

      Because this way you don't need to actually know acoustic. Which is somewhat concerning, actually.

      1. LionelB Silver badge

        To be fair, they do say "... the researchers built into their model features of acoustics."

    2. Little Mouse

      My first thoughts on reading the article were that this should just be a straight-up maths problem, no ML required. The audio equivalent of raytracing, if you like.

      My intuition tells me that this scenario might be too complex for a ML model as there are simply too many variables. Literally any number of objects, of any shape, of any material, in any location, in a room that itself could be any shape, and size and any material. How would you format the training data in any meaningful way?

      1. Dave 126 Silver badge

        > just be a straight-up maths problem, no ML required. The audio equivalent of raytracing, if you like.

        Curiously, both nVidia and AMD use ML to increase the perceived quality of raytraced (and of other rendered) images in games. If we are to continue your analogy, we might consider the possibility a similar hybrid approach might be applied to spatial audio.

      2. LionelB Silver badge

        The audio equivalent of raytracing, if you like.

        Much more complex, I'd imagine. Because of the orders-of-magnitude lower frequencies, sound (especially bass) diffracts way more than light - which is why, even though you can't see them, you can hear someone talking if they're standing behind a tree. The exact nature of the diffraction depends on the shape (and I suspect even composition, density and texture) of objects and surfaces. Taking into account all the objects and surfaces in an arena, I would imagine that for reasonably complex situations the maths problem is intractable in practice, even with powerful software resources. A skilled and experienced acoustic engineer with good intuitions can do a fair job, at least for a static scenario (studio engineers do this all the time, it's part of the job description) but I don't think it's implausible that an ML -- even if trained on limited data -- may nevertheless arrive at useful, reasonably general heuristics (after all, the entire point of ML is to generalise from limited training data).

    3. LionelB Silver badge

      I wonder though -- you're certainly correct that a skilled acoustic engineer can do wonders (see also my reply earlier in this thread) -- but games and (ahem) "multiverse" thingamies are not going to be static, and I'd imagine that the dynamic problem becomes orders of magnitude harder. Perhaps ML may be able to arrive at some useful heuristics that would elude an engineer trained to deal with static scenarios.

      (I'm quite a fan of Bose acoustical tech, as it happens - the hardware and engineering is fine, but their software frankly sucks. Every goddamn update on my Soundtouch speaker at home breaks something. I try to avoid updates, but it breaks anyway. Sound is great, though, when you can get any out of it,)

  2. vtcodger Silver badge

    A multimodal multiverse?

    A multimodal Multiverse? I'm pretty sure that's exactly what they told Pandora would be in that box. My recommendation -- Don't open it. Sell it to someone on eBay. Or give it anonymously to someone you don't like.

  3. Anonymous Coward
    Anonymous Coward

    Who would want, much less need it?

    Other than an audiophile with thousands invested in kit who wants to precisely position their speakers, I can't see a need.

    For most applications, this is just too precise. Certainly any metaverse can do without this level of accuracy.

  4. M.V. Lipvig Silver badge

    So it models what I'd hear, eh? i constantly hear a 2700hz tone*, 24x7. Will it model that in too?

    *One of the bennies of being in telecom. I used to work on analog ring down circuits, and one slow day I booted a test set and played with the tone generator until the tone generator matched what I hear all the time. 2700hz, all the time.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like