Recognizing speech in the presence of additional audio

US11031002B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11031002-B2
Application numberUS-201916548947-A
CountryUS
Kind codeB2
Filing dateAug 23, 2019
Priority dateFeb 14, 2014
Publication dateJun 8, 2021
Grant dateJun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 2. The method of claim 1 , comprising: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 3. The method of claim 1 , comprising: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 4. The method of claim 3 , wherein determining that the first audio data includes speech comprises: providing, as an input to a model that is configured to determine whether received audio data includes speech, the first audio data; and receiving, from the model, data indicating that the first audio data includes speech. 5. The method of claim 4 , wherein the model configured to determine whether received audio data includes speech using audio fingerprinting or a neural network classifier. 6. The method of claim 1 , comprising: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription. 7. The method of claim 1 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to mute the second computing device. 8. The method of claim 1 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to reduce a volume of the second audio data. 9. A system comprising: one or more computers; and one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 10. The system of claim 9 , wherein the operations comprise: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 11. The system of claim 9 , wherein the operations comprise: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 12. The system of claim 11 , wherein determining that the first audio data includes speech comprises: providing, as an input to a model that is configured to determine whether received audio data includes speech, the first audio data; and receiving, from the model, data indicating that the first audio data includes speech. 13. The system of claim 12 , wherein the model configured to determine whether received audio data includes speech using audio fingerprinting or a neural network classifier. 14. The system of claim 9 , wherein the operations comprise: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription. 15. The system of claim 9 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to mute the second computing device. 16. The system of claim 9 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to reduce a volume of the second audio data. 17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 18. The medium of claim 17 , wherein the operations comprise: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 19. The medium of claim 17 , wherein the operations comprise: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 20. The medium of claim 17 , wherein the operations comprise: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription.

Assignees

Inventors

Classifications

  • G10L15/222Primary

    Barge in, i.e. overridable guidance for interrupting prompts · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • for discriminating voice from noise · CPC title

  • in amplifiers suitable for low-frequencies, e.g. audio amplifiers (H03G3/32, H03G3/34 take precedence) · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11031002B2 cover?
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additiona…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/222. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).