Voice data transmission method and apparatus
US-2024363120-A1 · Oct 31, 2024 · US
US9799329B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9799329-B1 |
| Application number | US-201414559687-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 3, 2014 |
| Priority date | Dec 3, 2014 |
| Publication date | Oct 24, 2017 |
| Grant date | Oct 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes, in part, techniques and devices for identifying recurring environmental sounds in an environment such that these sounds may be canceled out of corresponding audio signals to increase signal-to-noise ratios (SNRs) of the signals and, hence, improve automatic speech recognition (ASR) on the signals. Recurring environmental sounds may include the ringing of a mobile phone, the beeping sound of a microphone, the buzzing of a washing machine, or the like.
Opening claim text (preview).
What is claimed is: 1. An electronic device comprising: one or more microphones; one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed, cause the one or more processors to perform acts comprising: receiving a first audio signal generated by the one or more microphones based on first sound in an environment; determining frequency and amplitude of the first audio signal; determining a direction within the environment from which the first sound originated; creating a signature of the first sound based at least in part on the frequency, the amplitude, and the direction; determining that the signature corresponds to a stored signature associated with an environmental sound; incrementing a number of times that the environmental sound has been captured within the environment; determining that the number of times is greater than a threshold; storing an indication that the environmental sound is to be canceled from a subsequent audio signal that is generated by the microphone and that indicates the environmental sound; receiving a second audio signal generated by the one or more microphones based on second sound in the environment, the second audio signal including a first component corresponding to the environmental sound and a second component corresponding to a voice command uttered by a user; identifying the first component based at least in part on frequency, amplitude, and direction of at least a portion of the second audio signal; removing, from the second audio signal, the first component to generate a modified second audio signal; and performing automatic speech recognition (ASR) on the modified second audio signal to identify the voice command uttered by the user. 2. An electronic device as recited in claim 1 , the acts further comprising: performing ASR on the first audio signal; and determining that the first audio signal does not include a voice command from the user. 3. An electronic device as recited in claim 1 , the acts further comprising selecting, based at least in part on at least one of frequency or amplitude of the second audio signal, one of multiple methods to implement to remove the first component, wherein the multiple methods include: utilizing a filter to remove at least one specified frequency range from the second audio signal and; subtracting the first component of the second audio signal from the second audio signal. 4. An electronic device as recited in claim 1 , the acts further comprising determining that the user uttered a keyword, and wherein the receiving of the first audio signal occurs at least partly in response to determining that the user uttered the keyword. 5. A method comprising: receiving, by a computing device, a first audio signal representative of a first sound in an environment; determining frequency and amplitude of the first audio signal; determining a first signature of the first sound based at least in part on the frequency and the amplitude; determining, using the first signature, a number of times that a second sound has previously been received, the second sound comprising a second signature that matches the first signature; determining that the number of times is greater than a threshold; generating an indication that the first signature corresponds to an environmental sound; storing the indication in a datastore; removing, by a filter of the computing device, the first audio signal corresponding to the environmental sound from subsequent audio signals; and sending the subsequent audio signals for processing. 6. A method as recited in claim 5 , wherein the determining the number of times comprises determining the number of times that the second sound has been received without a voice command being detected within a predefined amount of time. 7. A method as recited in claim 5 , further comprising determining a direction within the environment from which the first sound originated, and wherein the first signature is further based at least in part on the direction. 8. A method as recited in claim 5 , further comprising: receiving a second audio signal having a first component corresponding to the environmental sound and a second component corresponding to a voice command uttered by a user; and determining that the first component corresponds to the environmental sound based at least in part on at least one of frequency or amplitude of the second audio signal, wherein removing the first audio signal corresponding to the environmental sound from subsequent audio signals comprises removing the first component from the second audio signal. 9. A method as recited in claim 5 , wherein the filter corresponds to at least one frequency range associated with the environmental sound. 10. A method as recited in claim 8 , wherein the removing the first component comprises subtracting an amplitude of the first component from an amplitude of the second audio signal. 11. A method as recited in claim 8 , further comprising selecting one of multiple methods to implement to remove the first component from the second audio signal, the selecting based at least in part on at least one of the frequency or the amplitude of the second audio signal. 12. A method as recited in claim 5 , further comprising determining that a user in the environment uttered a keyword, and wherein the receiving of the first audio signal occurs at least partly in response to determining that the user uttered the keyword. 13. One or more computing devices comprising: one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed, cause the one or more processors to perform acts comprising: receiving a first audio signal representative of a first sound in an environment; determining frequency and amplitude of the first audio signal; determining a first signature of the first sound based at least in part on the frequency and the amplitude; determining, using the first signature, a number of times that a second sound has previously been received, the second sound comprising a second signature that matches the first signature; determining that the number of times is greater than a threshold; storing an indication that the first signature corresponds to an environmental sound; removing the environmental sound from a subsequent audio signal; and causing automatic speech recognition to be performed on the subsequent audio signal. 14. One or more computing devices as recited in claim 13 , wherein the determining the number of times comprises determining the number of times that the second sound has been received without a voice command being detected within a predefined amount of time. 15. One or more computing devices as recited in claim 13 , further comprising determining a direction within the environment from which the first sound originated, and wherein the first signature is further based at least in part on the direction. 16. One or more computing devices as recited in claim 13 , the acts further comprising: receiving a second audio signal having a first component corresponding to the environmental sound and a second component corresponding to a voice command uttered by a user; determining that the first component corresponds to the environment sound based at least in part on at least one of frequency or amplitude of the second audio signal; and removing the first component from the second audio signal. 17. One or more computing devices as recited in claim 16 , wherein the removing com
for comparison or discrimination · CPC title
Adaptation · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Training · CPC title
Interactive procedures; Man-machine interfaces · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.