What technology area does this patent fall under?

Primary CPC classification G10L15/222. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Recognizing speech in the presence of additional audio

US9601116B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9601116-B2
Application number	US-201615093309-A
Country	US
Kind code	B2
Filing date	Apr 7, 2016
Priority date	Feb 14, 2014
Publication date	Mar 21, 2017
Grant date	Mar 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additional audio signal corresponds to an utterance of a user. The method further includes initiating a reduction in an audio output level of the speaker device based on determining that the additional audio signal corresponds to the utterance of the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 2. The method of claim 1 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 3. The method of claim 2 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 4. The method of claim 1 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 5. The method of claim 1 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and determining that the audio signal comprises additional audio other than the synthesized voice based on the first vector satisfying a threshold. 6. The method of claim 1 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is an i-vector based model. 7. The method of claim 1 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is a neural network based model. 8. A non-transitory computer readable storage device storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 9. The computer readable storage device of claim 8 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 10. The computer readable storage device of claim 9 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 11. The computer readable storage device of claim 8 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 12. The computer readable storage device of claim 8 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and determining that the audio signal comprises additional audio other than the synthesized voice based on the first vector satisfying a threshold. 13. The computer readable storage device of claim 8 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is an i-vector based model. 14. The computer readable storage device of claim 8 , wherein each of the model that is trained to detect a presence of a synthesized voice and the model that is trained to detect a presence of a user's voice is a neural network based model. 15. A system comprising: one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving, by a mobile device, an audio signal; determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice; in response to determining, by the mobile device and using a model that is trained to detect a presence of a synthesized voice and a model that is trained to detect a presence of a user's voice, that the audio signal likely includes both the synthesized voice and the user's voice, suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device; after suppressing operation of the speech synthesis module, obtaining, by the mobile device, a transcription corresponding to the audio signal from an automated speech recognizer; and providing, by the mobile device, the transcription for output. 16. The system of claim 15 , wherein suppressing, by the mobile device, operation of a speech synthesis module implemented by the mobile device comprises initiating a reduction in an audio output level of the speech synthesis module. 17. The system of claim 16 , wherein initiating a reduction in an audio output level of the speech synthesis module comprises interrupting output of the speech synthesis module. 18. The system of claim 15 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; comparing the first vector to a second vector corresponding to the model that is trained to detect a presence of a synthesized voice; and determining that the audio signal comprises additional audio other than the synthesized voice based on a result of the comparison satisfying a threshold. 19. The system of claim 15 , further comprising: obtaining a first vector corresponding to at least a portion of the audio signal; and det

Assignees

Google Inc

Inventors

Classifications

H03G3/3005
in amplifiers suitable for low-frequencies, e.g. audio amplifiers (H03G3/32, H03G3/34 take precedence) · CPC title
G10L15/222Primary
Barge in, i.e. overridable guidance for interrupting prompts · CPC title
G06F3/167
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
G10L25/84
for discriminating voice from noise · CPC title
G10L17/00
Speaker identification or verification techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 53798631

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9601116B2 cover?: The technology described in this document can be embodied in a computer-implemented method that includes receiving, at a processing system, a first signal including an output of a speaker device and an additional audio signal. The method also includes determining, by the processing system, based at least in part on a model trained to identify the output of the speaker device, that the additiona…
Who is the assignee on this patent?: Google Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/222. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Speech quality under heavy noise conditions in hands-free communication

Systems and methods for noise reduction using speech recognition and speech synthesis

Headset Dictation Mode

Content-Aware Speaker Recognition

Headset Interview Mode

Frequently asked questions