Speech quality under heavy noise conditions in hands-free communication
US-2016260440-A1 · Sep 8, 2016 · US
US9601116B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9601116-B2 |
| Application number | US-201615093309-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 7, 2016 |
| Priority date | Feb 14, 2014 |
| Publication date | Mar 21, 2017 |
| Grant date | Mar 21, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 2. The method of claim 1 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 3. The method of claim 2 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 4. The method of claim 1 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 5. The method of claim 1 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and determining that the audio signal comprises additional audio other than the synthesized voice based on the first vector satisfying a threshold. 6. The method of claim 1 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is an i-vector based model. 7. The method of claim 1 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is a neural network based model. 8. A non-transitory computer readable storage device storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 9. The computer readable storage device of claim 8 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 10. The computer readable storage device of claim 9 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 11. The computer readable storage device of claim 8 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 12. The computer readable storage device of claim 8 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and determining that the audio signal comprises additional audio other than the synthesized voice based on the first vector satisfying a threshold. 13. The computer readable storage device of claim 8 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is an i-vector based model. 14. The computer readable storage device of claim 8 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is a neural network based model. 15. A system comprising: one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 16. The system of claim 15 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 17. The system of claim 16 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 18. The system of claim 15 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 19. The system of claim 15 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and det
in amplifiers suitable for low-frequencies, e.g. audio amplifiers (H03G3/32, H03G3/34 take precedence) · CPC title
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
for discriminating voice from noise · CPC title
Speaker identification or verification techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.