back to article Hey Siri, use this ultrasound attack to disarm a smart-home system

Academics in the US have developed an attack dubbed NUIT, for Near-Ultrasound Inaudible Trojan, that exploits vulnerabilities in smart device microphones and voice assistants to silently and remotely access smart phones and home devices. The research team — Guenevere Chen, an associate professor at the University of Texas at …

  1. Ashto5

    Little voice in the back of your mind

    That is fantastic piece of research now they just need to run their detection methodology over the 100 billion tictoks

    Or

    Switch the assistants off

    1. david 12 Silver badge

      Re: Little voice in the back of your mind

      On a complete tangent: This would frighten and terrify my schizophrenic friend. He's had bad experiences with little voices in the back of his mind.

  2. Winkypop Silver badge
    Stop

    Simple but effective defence

    Don’t allow these “devices” in your house.

    1. Anonymous Coward
      Anonymous Coward

      Re: Simple but effective defence

      They have a lot in common with vampires: only a threat if you invite them into your home.

      1. Michael Wojcik Silver badge

        Re: Simple but effective defence

        And they make their victims (even more) listless and idle. You may be on to something.

        Unfortunately it takes a lot of sunlight to kill them. The stake thing probably works though.

    2. Blacklight

      Re: Simple but effective defence

      Not even that.

      Don't allow anything other than a physical code entry or token to disarm an alarm.

  3. Neil Barnes Silver badge

    One C, one R

    Why on earth does the input stage from the microphone require a 20kHz bandwidth? 300 to 3500Hz is where the majority of human speech fits so using anything outside that range... Basically it should never get into the ADC.

    1. jake Silver badge

      Re: One C, one R

      "300 to 3500Hz is where the majority of human speech fits"

      Good old analog POTS used 300–3300Hz for decades, with no issues to speak of.

    2. IGotOut Silver badge

      Re: One C, one R

      I don't know. Maybe people record things other than the human voice via the phone?

      1. jake Silver badge

        Re: One C, one R

        "Maybe people record things other than the human voice via the phone?"

        Some people use screwdrivers as hammers, too.

        1. Little Mouse

          Re: One C, one R

          The flip side of that concept is known as a "Birmingham screwdriver" here in Blighty.

        2. Anonymous Coward
          Anonymous Coward

          Re: One C, one R

          Not me, I use a screw driver as a chisel

          1. Michael Wojcik Silver badge

            Re: One C, one R

            Two screwdrivers and some duct tape, and you have a nice pair of pliers.

            1. 080

              Re: One C, one R

              Or a battery tester

          2. Elongated Muskrat Silver badge

            Re: One C, one R

            I like to live dangerously and use a chisel as a screwdriver

      2. Sorry that handle is already taken. Silver badge

        Re: One C, one R

        People record all sorts of things but the voice assistant shouldn't be responding to them

        1. This post has been deleted by its author

    3. Kevin McMurtrie Silver badge

      Re: One C, one R

      A single R-C filter isn't good for much and these microphones are only 2 to 8 cubic mm. They're already full of with MEMS hardware, MEMS electrostatic bias, digitizer, and solder pads.

      There could be a software fix if it's possible to adjust the sample rate. These digital MEMS mics take samples at whatever speed they're clocked for, so varying the clock rate of the interface would scramble sample aliasing attacks. Of course you'd have to resample the result in software without adding any new aliasing bugs. It's easy math but mistakes wouldn't be audible.

    4. DS999 Silver badge

      Re: One C, one R

      Having a microphone with a 20K or higher range is fine - maybe you want to record music with it. The mistake is allowing the full range as input to a voice assistant. Drop a narrowband filter in front of it in the human speech range, problem solved.

      1. YetAnotherLocksmith

        Re: One C, one R

        Next up: ultrasonic downmixing attacks on your voice assistant!

    5. SCP

      Re: One C, one R

      Why on earth does the input stage from the microphone require a 20kHz bandwidth?

      Not a topic I have any particular knowledge about - but might a broader bandwidth allow improvements in speech recognition in noisy situations.

      1. Mast1

        Re: One C, one R

        Yes, 500-3500 Hz on POTS worked OK-ish for decades because you have redundancy in speech as well as (usually) context to help you resolve ambiguities. Add in background noise, and the redundancy degrades, and so does the resulting accuracy of interpretation. Hence the need for Alpha, Bravo, Charlie, Delta, Echo.......

        The range above 3500 Hz is useful for resolving direction, as well as being less prone to corruption by reverberation, and so helps separate out competing sources. As for the remaining range up to 20 kHz bit, true for music, dog whistles, and mosquito tones, but there is negligible energy above 11 kHz, in even high-quality recorded speech.

        ........ speaks a person too old to hear above 10 kHz these days.

        A wadge of cotton wool over the microphone would serve as a reasonable low-pass filter for speech.

        1. david1024

          Content in the upper highs

          There is content in the human voice, percussion, and strings upto and past 20k that lots of folks do hear. It is that airy and raspy quality you can hear from a CD or nice record that muddy FM (which stops around 15khz) is missing. And then there's AM which is what is being suggested here. In short, having the ability to record upto and past 20-20kHz is very desirable and produces a noticable improvement to the sound for nearly everyone. It also helps simplify the maths, but that's a whole other ballgame.

      2. AlbertH

        Re: One C, one R

        Not a topic I have any particular knowledge about - but might a broader bandwidth allow improvements in speech recognition in noisy situations.

        Nope. In most instances you want to restrict the bandwidth to improve intelligibility. Obviously there's a limit to bandwidth reduction before you get to the point of reducing intelligibility, but if you cover the "speech band" - which is that which was conveyed by the old telephone network - (300Hx to 3kHz is normally enough, sometimes with slight emphasis around 2kHz), you achieve maximum intelligibility, even in noisy environments. The use of noise-cancelling microphone methods really help (two microphones in opposite directions - the user talks into one,and both receive the ambient noise. By phase reversing one of the mics, the common-mode signals - ie: the ambient noise - is cancelled, and the difference - the speech - is transmitted).

        The "hi-fi" bandwidth of mobile phone microphones is probably provided to enhance recording. It's trivial to filter the speech input to these "digital assistants", so it should be done!

    6. Anonymous Coward
      Anonymous Coward

      Re: One C, one R

      I think you mean R4 and the knights of the round table - Guenevere should fit right in ... what a great name

    7. Snake Silver badge

      Re: One C, one R

      Because I am pretty sure the microphones are off the shelf hardware, not custom made per smartphone design. Therefore the manufacturer would choose to design a product that offers the most versatility to designers, offer the most sales options to potential designers / buyers. The single mic design can be used in everything from voice recorders to smartphones to devices ready to record music, all without incurring the costs of additional SKU's. This allows tremendous volume production, due to lower inventory costs which is pretty much high on all these device maker's requirement lists.

      So, yeah. Industrialized capitalism.

    8. ravenviz Silver badge
      Boffin

      Re: One C, one R

      You need to control the Nyquist frequency of the digital sample interval, the frequency at which higher analog frequencies start to alias into successively lower digital frequencies (think of wagon wheels seeming to go backwards in movies). Sampling to allow 20 kHz means any higher frequency sampling interference into frequencies below 20 kHz still remains well out of audible range.

  4. doublelayer Silver badge

    Voice filtration may help

    They have demonstrated that they can activate a lot of voice assistants, but all but one of them is going to talk to the user while executing the malicious commands. That gives the user a chance to hear that something is going on, and more importantly, for most of the interaction, they can simply shout no to cancel it because most of the questions, such as authorizing a transaction or confirming a lock are going to ask a yes or no question and the local voice will be more easily detected than the ultrasound.

    The only one they can activate without making a loud sound is Siri, but that one will pose some extra problems. Unlike some others which listen for anyone saying their wake word, Siri is activated by pressing a button or by a specific voice. Activating the voice wake word requires the user to train the phone to recognize their voice specifically, and it then doesn't generally activate on someone else's voice. If you have a friend with an iPhone, try it and see if theirs turns on. This means that an attacker can't just create a single track to activate Siri on any device, and if they don't already have a recording of the victim saying the wake word, they can only hope to activate with other samples. This might provide some insulation to practical use of the attack.

    1. ChoHag Silver badge

      Re: Voice filtration may help

      > the wake word

      Is this the opposite of the safe word?

    2. Anonymous Coward
      Anonymous Coward

      Re: Voice filtration may help

      "Siri is activated by pressing a button or by a specific voice"

      If there's a way to turn off the "specific voice" component, so ONLY pressing the button would activate the assistant, that would pretty well stop these kinds of attacks. Bonus points if the microphone doesn't get turned on unless the button is being pushed, i.e. the "assistant" only listens when the button says to.

      1. doublelayer Silver badge

        Re: Voice filtration may help

        "If there's a way to turn off the "specific voice" component, so ONLY pressing the button would activate the assistant, that would pretty well stop these kinds of attacks."

        There is, and if you don't train it on your voice, that's the default.

        "Bonus points if the microphone doesn't get turned on unless the button is being pushed, i.e. the "assistant" only listens when the button says to."

        Yes, it has that. Because it's on a phone, the microphone is still connected, but if you don't have the voice activation turned on, Siri won't be processing any input from the mic.

    3. david 12 Silver badge

      Re: Voice filtration may help

      but all but one of them is going to talk to the user

      I don't have a connected garage door. Plenty of people do: they use their IOT system to open the door for tradesmen and deliveries when they are not at home

      Couple that with a music system left turned on or a beamed ultrasound attack, and you've got a potential problem.

      1. MachDiamond Silver badge

        Re: Voice filtration may help

        "Plenty of people do: they use their IOT system to open the door for tradesmen and deliveries when they are not at home"

        I do it the old fashioned way by hiding a key outside and telling them where it is. After they've been and done, I retrieve the key. If they don't return the key, they don't get paid. The downside is they could have the physical key duplicated, but if they want to return and nick some things later, they'd be better off breaking in since the lack of a forced entry would put them under suspicion. If I didn't happen to be available to pick up the phone to send a code to let them in when they deigned to show up, they'd just leave and bill me for the visit. At least I won't have hundreds in the tech that's required to do it the electronic way.

    4. YetAnotherLocksmith

      Re: Voice filtration may help

      Did you miss the first command being to almost mute the system?

  5. doublelayer Silver badge

    Amusing typo

    "And finally, iPhone 6 Plus wasn't vulnerable to either attack, likely because it uses a low-gain amplifier while more recent iPhones tested use a high-grain amplifier."

    I'd like to try a high-grain amplifier. Do you think that it's also nutritious?

    1. jake Silver badge
      Pint

      Re: Amusing typo

      "I'd like to try a high-grain amplifier. Do you think that it's also nutritious?"

      That would be called a "yeast starter" in the brewing trade. Yes, it's nutritious ... if a trifle sweet.

      This round's on me.

    2. DS999 Silver badge
      Happy

      Re: Amusing typo

      I only use whole grain amplifiers because I care about my health!

      1. cookieMonster

        Re: Amusing typo

        McVities !!!

    3. Gene Cash Silver badge

      Re: Amusing typo

      There's also the "fist-generation Echo Dot"

      Yeah, I'd put my fist through one too...

      1. diodesign (Written by Reg staff) Silver badge

        Fist in mouth

        Yeah, that's now fixed, too. Managed to sneak past the spellcheck. I suspect we were too focused on figuring out if this attack was legit or not.

        C.

    4. diodesign (Written by Reg staff) Silver badge

      Doh

      Yeah, ha ha, it's fixed now. Don't forget to email corrections@theregister.com if you spot something wrong like this please so we can adjust it right away.

      C.

  6. Lee D Silver badge

    That's alright, because you require authentication to make these devices do anything on your local network or with your local devices right?

    I mean, you can't just say "Do This Stupid Thing" in any voice and have it immediately carry out that command, right?

    You know, where "This Stupid Thing" could include "make unwanted phone calls and money transfers, disable alarm systems, or unlock doors". I mean, you put all those interfaces behind passwords and authentication and two-factor and confirmation that the requestor is the authorised user of the system, right?

    You don't just let someone turn off your alarm system by having a random stranger say "Turn off alarm system", right? That would just be terminally stupid, I think we agree.

    1. DryBones

      I seem to recall multiple Alexa video pranks for just that sort of thing.

    2. david 12 Silver badge

      So that's what's going on...

      Got a google home assistant adjacent to the computer and the TV. It's already spouting random nonsense when the volume is turned up.

      Clearly I need to put a tin foil hat on it.

    3. Michael Wojcik Silver badge

      you require authentication to make these devices do anything on your local network or with your local devices right?

      No, because I don't allow any standalone voice assistants in my home, and I don't enable the ones on my computing devices.

  7. Red Ted

    Near-Ultrasound

    I wonder what frequency they specify Near-Ultrasound as?

    There's a chance that my kids might be able to hear it, but I certainly won't (age and too many rock concerts).

    1. YetAnotherLocksmith

      Re: Near-Ultrasound

      Exactly this.

      Saying "just kill the ability to hear high pitched stuff" isn't really a clever solution.

  8. WolfFan

    Hmmm.

    What happens if you nuke Siri and her pals from MS, Google, and Amazon, from orbit? The first thing I do with a new iDevice is turn Siri off, and, where possible, delete the mouthy bitch. MS seems to have abandoned Cortana, and in any case I nuke that even more mouthy bitch on sight. And I don’t have any of Google’s or Amazon’s mouthy bitches, and never will. If the ‘voice assistant’ is turned off or deleted, it can’t be attacked, right?

    1. Gene Cash Silver badge

      Re: Hmmm.

      I recently had to install Windows 10 for the first time, and as a guy that's played HALO literally since it came out, having Cortana suddenly speak up during the install WAS CREEPY AS FUCK.

      It was about as unnerving as hearing a voice mail from my dead grandmother or something on that order.

      Made my skin crawl.

      I couldn't hit "turn that crap OFF NOW" fast enough.

  9. Neil 44

    Access from outside?

    Could you play "unlock door" to an ultrasonic transducer stuck onto a window so that people in the room wouldn't know the door was unlocked?

    Seems a bit risky because I guess even a normal speaker set so that it would vibrate the glass to transmit sound would make a door very vulnerable

  10. Steve Davies 3 Silver badge
    Childcatcher

    Finally...

    A use for Siri.

    1. fidodogbreath

      Re: Finally...

      Ninja'd

  11. Paul Hovnanian Silver badge

    What does Siri do ...

    ... when my plants start screaming?

  12. _Elvi_

    .. Emotional Support and Ultra-Soniks detector Cat ..

    .. I feel I'm INVINCEABLE from the attacks.. but the cat detector van is right outside

    oh dear. the gigs up I'm afraid..

  13. Pierre 1970

    iDog

    Apple has already patented an iDog, which is a device in the shape of a classic dog (with rounded corners, of course) in order to detect the attack and alert the owner with iBarks . Until now they couldn't manage to solve the issues with Siri ordering tonnes of dog food with each complain of the puppy.

  14. Ideasource

    Fools will be fools.

    They created their own vulnerability by trading reliable mechanisms for experimentals.

    You can't plug the security holes in their home strategy quick enough for they will install whole new points of exploit frustrating any attempt to help them.

    1. YetAnotherLocksmith

      Re: Fools will be fools.

      As a locksmith, I can assure you that modern high tech enhances security, but don't have a fully digital override on your front door!

  15. Anonymous Coward
    Anonymous Coward

    Zzzzzzzz.

    It doesn't have to be silent to work - people sleep. That's the best (for the crooks) time to buzz open the front door, actually. Near me recently, somebody heard a crashing sound in the middle of the night, but went back to sleep. In the morning they discovered their garage door ripped off its hinges and things missing.

  16. Dr Dan Holdsworth
    FAIL

    So nobody's in Apple or Amazon have heard of low-pass filters, then?

  17. Richard 12 Silver badge
    Megaphone

    Those who do not learn from Nyquist

    Are doomed to suffer aliasing attacks.

  18. MachDiamond Silver badge

    Odd MEMS effects

    Analog transducers aren't linear. With MEMS, it can be even more odd. While the microphone's response isn't flat up to 30kHz or so, it can have peaks above the audible range and adding a low-pass filter in some of these circuits isn't an option. The effect could be one of sub-harmonic distortion so you tickle the mic above audible and it outputs a signal well below that. I had to do a deep dive into inertial management units (IMU's) some years ago and got familiar with MEMS issues in a very base level way.

    Encoding the signals into a YouTube video isn't going to work. The playback device would need to be able to reproduce the sounds and no consumer audio speakers I've ever come across are better than 10dB down at 20kHz on their way to nothing. The target would need to have a rather expensive audiophile system or a professional audio system with TAD Beryllium HF drivers. The source would also need to have no High and Low pass filtering which is very common to prevent overloading by out of band signals. Many modern amplifiers have Low Pass filters as they can have enough bandwidth to transmit if they aren't 'slowed down'.

  19. Anonymous Coward
    Anonymous Coward

    "no consumer audio speakers I've ever come across are better than 10dB down at 20kHz on their way to nothing"

    You've seen only crappy stuff then, "not-speakers" so to say. Beepers or buzzers can't, I can admit that.

    Of course it depends on what you mean by 'consumer speakers', but typically anything you can buy from a shop is, by definition, 'consumer' stuff. Like my speakers and these are +-2 dB from 20 to 20kHz. Nothing high end, just hifi.

    https://www.whathifi.com/best-buys/hi-fi/best-hi-fi-speakers

    .... and every one of them reaches up to 20kHz ... cheapest with a massive price tag of £250. For a speaker that's not a lot.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like