Recognizing speech in the presence of additional audio

US11942083B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11942083-B2
Application numberUS-202117303139-A
CountryUS
Kind codeB2
Filing dateMay 21, 2021
Priority dateFeb 14, 2014
Publication dateMar 26, 2024
Grant dateMar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 2. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 3. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize output audio from the computing device. 4. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 5. The computer-implemented method of claim 1 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 6. The computer-implemented method of claim 5 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 7. The computer-implemented method of claim 1 , wherein the data processing hardware is implemented on the computing device. 8. The computer-implemented method of claim 1 , wherein the computing device comprises a mobile phone. 9. The computer-implemented method of claim 1 , wherein the computing device comprises a speaker device. 10. The computer-implemented method of claim 1 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 12. The system of claim 11 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 13. The system of claim 11 , wherein the neural network-based model is trained to recognize output audio from the computing device. 14. The system of claim 11 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 15. The system of claim 11 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 16. The system of claim 15 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 17. The system of claim 11 , wherein the data processing hardware is implemented on the computing device. 18. The system of claim 11 , wherein the computing device comprises a mobile phone. 19. The system of claim 11 , wherein the computing device comprises a speaker device. 20. The system of claim 11 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice.

Assignees

Inventors

Classifications

  • G10L15/20Primary

    Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • G10L15/222Primary

    Barge in, i.e. overridable guidance for interrupting prompts · CPC title

  • Decision making techniques; Pattern matching strategies · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11942083B2 cover?
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additiona…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).