Recognizing speech in the presence of additional audio
US-10431213-B2 · Oct 1, 2019 · US
US11031002B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11031002-B2 |
| Application number | US-201916548947-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 23, 2019 |
| Priority date | Feb 14, 2014 |
| Publication date | Jun 8, 2021 |
| Grant date | Jun 8, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 2. The method of claim 1 , comprising: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 3. The method of claim 1 , comprising: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 4. The method of claim 3 , wherein determining that the first audio data includes speech comprises: providing, as an input to a model that is configured to determine whether received audio data includes speech, the first audio data; and receiving, from the model, data indicating that the first audio data includes speech. 5. The method of claim 4 , wherein the model configured to determine whether received audio data includes speech using audio fingerprinting or a neural network classifier. 6. The method of claim 1 , comprising: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription. 7. The method of claim 1 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to mute the second computing device. 8. The method of claim 1 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to reduce a volume of the second audio data. 9. A system comprising: one or more computers; and one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 10. The system of claim 9 , wherein the operations comprise: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 11. The system of claim 9 , wherein the operations comprise: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 12. The system of claim 11 , wherein determining that the first audio data includes speech comprises: providing, as an input to a model that is configured to determine whether received audio data includes speech, the first audio data; and receiving, from the model, data indicating that the first audio data includes speech. 13. The system of claim 12 , wherein the model configured to determine whether received audio data includes speech using audio fingerprinting or a neural network classifier. 14. The system of claim 9 , wherein the operations comprise: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription. 15. The system of claim 9 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to mute the second computing device. 16. The system of claim 9 , wherein providing the instruction to suppress outputting the second audio data comprises: providing an instruction to reduce a volume of the second audio data. 17. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising: receiving, by a first computing device, first audio data that includes an utterance; determining, by the first computing device, that a second computing device is outputting second audio data; and based on determining that that the second computing device is outputting the second audio data and based on receiving the audio data that includes the utterance, providing, by the first computing device and for output to the second computing device, an instruction to suppress outputting the second audio data. 18. The medium of claim 17 , wherein the operations comprise: determining, by the first computing device, that the second audio data include speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the second audio data include speech. 19. The medium of claim 17 , wherein the operations comprise: determining, by the first computing device, that the first audio data includes speech, wherein providing the instruction to suppress outputting the second audio data is based on determining that the first audio data include speech. 20. The medium of claim 17 , wherein the operations comprise: after providing the instruction to suppress outputting the second audio data, obtaining, by the first computing device, a transcription of the utterance; and providing, for output, the transcription.
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
for discriminating voice from noise · CPC title
in amplifiers suitable for low-frequencies, e.g. audio amplifiers (H03G3/32, H03G3/34 take precedence) · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.