Causing a voice enabled device to defend against inaudible signal attacks

US11264047B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11264047-B2
Application numberUS-202016858200-A
CountryUS
Kind codeB2
Filing dateApr 24, 2020
Priority dateOct 20, 2017
Publication dateMar 1, 2022
Grant dateMar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A voice enabled device includes a transducer to capture multiple inaudible signals received from multiple ultrasonic speakers and audio recording electronics to process the multiple inaudible signals to generate digital output samples, which are recorded sound data comprising non-linearities from frequency-shifted versions of the multiple inaudible signals to within an audible frequency range. A processing device is to detect, within the recorded sound data, at least a portion of the non-linearities, e.g., via: comparison of the recorded sound data with expected patterns from an audible audio signal generated by human voice; and detection of non-linear variations within the recorded sound data as compared to the expected patterns. In response to the detection, the processing device is further to suppress an action programmed for response to a voice command corresponding to the recorded sound data.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice enabled device comprising: a transducer to capture multiple inaudible signals received from multiple ultrasonic speakers; audio recording electronics coupled to the transducer, the audio recording electronics to process the multiple inaudible signals to generate digital output samples, which are recorded sound data comprising non-linearities from frequency-shifted versions of the multiple inaudible signals to within an audible frequency range; and a processing device coupled to the audio recording electronics, wherein the processing device is to detect, within the recorded sound data, at least a portion of the non-linearities, wherein to detect the at least a portion of the non-linearities, the processing device is to: compare the recorded sound data with expected patterns from an audible audio signal generated by human voice; and detect non-linear variations within the recorded sound data as compared to the expected patterns, wherein the non-linear variations are detected as a result of the at least a portion of the non-linearities located within the recorded sound data corresponding to a strongest portion of the expected patterns and being in a sub-50 hertz (Hz) band; and wherein, in response to the detection, the processing device is further to suppress an action programmed for response to a voice command corresponding to the recorded sound data. 2. The voice enabled device of claim 1 , wherein a width of a fundamental frequency combined with widths of corresponding harmonics of the fundamental frequency within the audible audio signal is a time-varying frequency, and to detect the at least a portion of the non-linearities, the processing device is further to: determine a first energy variation over time within a frequency band that is between zero and the time-varying frequency; and correlate the first energy variation with a second energy variation at a fundamental frequency in the recorded sound data that is greater than the frequency band. 3. The voice enabled device of claim 2 , wherein to determine the first energy variation, the processing device is to use standard acoustic libraries, the processing device further to: determine a first average power of the fundamental frequency around the width of the time-varying frequency; determine a second average power, over time, of the recorded sound data that is within the frequency band; remove, from the first average power and the second average power, windows of time during which the fundamental frequency falls below the first average power; and compute a correlation coefficient between the first average power and the second average power. 4. The voice enabled device of claim 2 , wherein the processing device is further to employ an average width of the frequency band that is approximately 20 hertz (Hz). 5. The voice enabled device of claim 1 , wherein, to detect the non-linear variations, the processing device is further to detect that the at least a portion of the non-linearities are at positively-biased harmonics comprising an amplitude skew. 6. The voice enabled device of claim 5 , wherein to detect that the at least a portion of the non-linearities are at positively-biased harmonics comprising the amplitude skew, the processing device is further to: determine a first ratio of maximum and minimum amplitude of the audible audio signal; determine a second ratio of maximum and minimum amplitude of the recorded sound data; and compare the second ratio to the first ratio. 7. The voice enabled device of claim 1 , wherein the processing device is further to: compare the recorded sound data to pre-recorded voice commands; and determine that the recorded sound data corresponds to the voice command listed among the pre-recorded voice commands. 8. A method comprising: capturing, using a transducer, multiple inaudible signals received from multiple ultrasonic speakers; generating, using audio recording electronics coupled to the transducer, digital output samples of the multiple inaudible signals, wherein the digital output samples are recorded sound data comprising non-linearities from frequency-shifted versions of the multiple inaudible signals to within an audible frequency range; detecting, within the recorded sound data using a processing device, at least a portion of the non-linearities, wherein the detecting comprises: comparing the recorded sound data with expected patterns from an audible audio signal generated by human voice; and detecting non-linear variations within the recorded sound data as compared to the expected patterns, wherein the non-linear variations are detected as a result of the at least a portion of the non-linearities located within the recorded sound data corresponding to a strongest portion of the expected patterns and being in a sub-50 hertz (Hz) band; and in response to the detecting, suppressing, using the processing device, an action programmed for response to a voice command corresponding to the recorded sound data. 9. The method of claim 8 , wherein a width of a fundamental frequency combined with widths of corresponding harmonics of the fundamental frequency within the audible audio signal is a time-varying frequency, and detecting the at least a portion of the non-linearities further comprises: determining a first energy variation over time within a frequency band that is between zero and the time-varying frequency; and correlating the first energy variation with a second energy variation at a fundamental frequency in the recorded sound data that is greater than the frequency band. 10. The method of claim 9 , further comprising: employing standard acoustic libraries to determine the first energy variation, the fundamental frequency, and the corresponding harmonics; determining a first average power of the fundamental frequency around the width of the time-varying frequency; determining a second average power, over time, of the recorded sound data that is within the frequency band; removing, from the first average power and the second average power, windows of time during which the fundamental frequency falls below the first average power; and compute a correlation coefficient between the first average power and the second average power. 11. The method of claim 9 , further comprising employing an average width of the frequency band that is approximately 20 hertz (Hz). 12. The method of claim 8 , wherein detecting the non-linear variations further comprises detecting that the at least a portion of the non-linearities are at positively-biased harmonics comprising an amplitude skew. 13. The method of claim 12 , wherein detecting that the at least a portion of the non-linearities are at positively-biased harmonics comprising the amplitude skew further comprises: determining a first ratio of maximum and minimum amplitude of the audible audio signal; determining a second ratio of maximum and minimum amplitude of the recorded sound data; and comparing the second ratio to the first ratio. 14. The method of claim 8 , further comprising: comparing the recorded sound data to pre-recorded voice commands; and determining that the recorded sound data corresponds to the voice command listed among the pre-recorded voice commands. 15. A system comprising: a microphone comprising: a transducer to capture a combination of multiple inaudible signals received from multiple ultrasonic speakers; and audio recording electronics coupled to the transducer, the audio recording electronics to process the combination of the multiple inaudible signals to generate digital output sam

Assignees

Inventors

Classifications

  • G10K9/122Primary

    using piezoelectric driving means {(G10K9/121 takes precedence)} · CPC title

  • by electro-acoustically regenerating the original acoustic waves in anti-phase · CPC title

  • Synthesis of acoustic waves (synthesis of speech G10L13/00) · CPC title

  • for coupling gramophone pick-up, recorder output, or microphone to receiver · CPC title

  • by combining a number of identical transducers {(specially adapted for hearing aids H04R25/405)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11264047B2 cover?
A voice enabled device includes a transducer to capture multiple inaudible signals received from multiple ultrasonic speakers and audio recording electronics to process the multiple inaudible signals to generate digital output samples, which are recorded sound data comprising non-linearities from frequency-shifted versions of the multiple inaudible signals to within an audible frequency range. …
Who is the assignee on this patent?
Board Of Trustees Of The Univ Of Illinois
What technology area does this patent fall under?
Primary CPC classification G10K9/122. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).