What technology area does this patent fall under?

Primary CPC classification G10L15/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Recognizing speech in the presence of additional audio

US11942083B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11942083-B2
Application number	US-202117303139-A
Country	US
Kind code	B2
Filing date	May 21, 2021
Priority date	Feb 14, 2014
Publication date	Mar 26, 2024
Grant date	Mar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method when executed on data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 2. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 3. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to recognize output audio from the computing device. 4. The computer-implemented method of claim 1 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 5. The computer-implemented method of claim 1 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 6. The computer-implemented method of claim 5 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 7. The computer-implemented method of claim 1 , wherein the data processing hardware is implemented on the computing device. 8. The computer-implemented method of claim 1 , wherein the computing device comprises a mobile phone. 9. The computer-implemented method of claim 1 , wherein the computing device comprises a speaker device. 10. The computer-implemented method of claim 1 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice. 11. A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware causes the data processing hardware to perform operations comprising: while audio is being played back from a computing device, receiving a first audio signal captured by a microphone of the computing device, the first audio signal comprising the played back audio and speech audio corresponding to a query, the played back audio different than the speech audio corresponding to the query; processing, using a neural network-based model, the first audio signal to determine that the speech audio corresponding to the query was spoken by a user of the computing device; and in response to determining that the speech audio corresponding to the query was spoken by the user, generating a second audio signal that comprises the speech audio corresponding to the query and suppresses the played back audio from the first audio signal captured by the microphone. 12. The system of claim 11 , wherein the neural network-based model is trained to recognize a presence of a voice of the user of the computing device. 13. The system of claim 11 , wherein the neural network-based model is trained to recognize output audio from the computing device. 14. The system of claim 11 , wherein the neural network-based model is trained to: to recognize a presence of a voice of the user of the computing device; and to recognize output audio from the computing device. 15. The system of claim 11 , wherein the operations further comprise processing the second audio signal to generate a transcription of the query spoken by the user. 16. The system of claim 15 , wherein the operations further comprise: transforming the transcription of the query into a structured representation; and processing, using a particular application, the structured representation. 17. The system of claim 11 , wherein the data processing hardware is implemented on the computing device. 18. The system of claim 11 , wherein the computing device comprises a mobile phone. 19. The system of claim 11 , wherein the computing device comprises a speaker device. 20. The system of claim 11 , wherein the operations further comprise providing, for audible output from the computing device, a text-to-speech (TTS) output conveying a response to the query in a synthesized voice.

Assignees

Google Llc

Inventors

Classifications

G10L15/20Primary
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
G06F3/165
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
G06F3/167
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
G10L15/222Primary
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title

Patent family

Related publications grouped by family.

View patent family 53798631

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11942083B2 cover?: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additiona…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).