Audio cancellation for voice recognition
US-11605393-B2 · Mar 14, 2023 · US
US11942083B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11942083-B2 |
| Application number | US-202117303139-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 21, 2021 |
| Priority date | Feb 14, 2014 |
| Publication date | Mar 26, 2024 |
| Grant date | Mar 26, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 2. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 3. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize output audio from the computing device. 4. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 5. The computer-implemented method of claim 1 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 6. The computer-implemented method of claim 5 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 7. The computer-implemented method of claim 1 , wherein the data processing hardware is implemented on the computing device. 8. The computer-implemented method of claim 1 , wherein the computing device comprises a mobile phone. 9. The computer-implemented method of claim 1 , wherein the computing device comprises a speaker device. 10. The computer-implemented method of claim 1 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 12. The system of claim 11 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 13. The system of claim 11 , wherein the neural network-based model is trained to recognize output audio from the computing device. 14. The system of claim 11 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 15. The system of claim 11 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 16. The system of claim 15 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 17. The system of claim 11 , wherein the data processing hardware is implemented on the computing device. 18. The system of claim 11 , wherein the computing device comprises a mobile phone. 19. The system of claim 11 , wherein the computing device comprises a speaker device. 20. The system of claim 11 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice.
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.