Who is the assignee on this patent?

Weber Frederick V, O'Neill Jeffrey C, Amazon Tech Inc

What technology area does this patent fall under?

Primary CPC classification G10L15/07. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Unsupervised acoustic model training

US9401140B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9401140-B1
Application number	US-201213592157-A
Country	US
Kind code	B1
Filing date	Aug 22, 2012
Priority date	Aug 22, 2012
Publication date	Jul 26, 2016
Grant date	Jul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An unsupervised acoustic modeling service for speech recognition is disclosed. A computing device may be present in a listening zone, such as a household or office, and may receive an audio signal that may include speech. In some instances, the speech is not directed to the computing device. Speech recognition results may be generated from the audio signal, using an acoustic model, and the results used to update the acoustic model. For example, the acoustic model may be updated to reflect a particular speaker's pronunciation of certain sound units, such as phonemes. By using speech that is not necessarily directed to the computing device, more data may be available for updating an acoustic model.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for unsupervised acoustic modeling, the system comprising: an electronic data store configured to store a plurality of acoustic models; a remote computing device in communication with the electronic data store; and a local computing device in communication with the remote computing device over a network; wherein the system is configured to: obtain a first signal from a microphone, wherein the first signal comprises first speech; generate first speech recognition results using the first signal and an acoustic model, wherein the first speech recognition results comprise a first textual transcription of the first speech in the first signal; subsequent to generating the first speech recognition results: determine a first confidence value indicative of a confidence that the first speech recognition results generated using the acoustic model comprise an accurate transcription of the first speech, generate a first updated acoustic model based at least in part on the first confidence value; generate second speech recognition results using a second signal and the first updated acoustic model, wherein the second speech recognition results comprise a second textual transcription of the second speech in the second signal; and subsequent to generating the second speech recognition results: determine a second confidence value indicative of a confidence that the second speech recognition results generated using the first updated acoustic model comprise an accurate transcription of the second speech; determine that the first speech recognition results were generated outside a window; and generate a second updated acoustic model based at least in part on the second confidence value. 2. A computer-implemented method comprising: under control of a computing device comprising one or more computer processors implemented in hardware and configured to execute specific computer-executable instructions, receiving a first signal; generating first speech recognition results using the first signal and an acoustic model, wherein the first speech recognition results comprise a textual transcription of first speech; and subsequent to generating the first speech recognition results: determining a first confidence value associated with the first speech recognition results; generating a first updated acoustic model based on the first confidence value associated with the first speech recognition results; receiving a second signal; generating second speech recognition results using the second signal and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech; determining a second confidence value associated with the second speech recognition results; determining that the first speech recognition results were generated outside a window; and generating a second updated acoustic model based at least in part on the second confidence value. 3. The system of claim 1 , wherein the system is further configured to monitor microphone output for a period of time exceeding ten minutes. 4. The system of claim 3 , wherein the first speech is not directed to the local computing device. 5. The system of claim 1 , wherein the system is configured to determine that the first signal comprises the first speech using based at least in part on a classification algorithm. 6. The system of claim 1 : wherein the acoustic model comprises a Gaussian mixture model; and wherein a Gaussian distribution is included in the Gaussian mixture model. 7. A computer-implemented method comprising: under control of a computing device comprising one or more computer processors implemented in hardware and configured to execute specific computer-executable instructions, receiving a first signal; generating first speech recognition results from the first signal using and an acoustic model, wherein the first speech recognition results comprise a textual transcription of first speech; and subsequent to generating the first speech recognition results: determining a first confidence value associated with the first speech recognition results; generating a first updated acoustic model based on the first confidence value associated with the first speech recognition results; receiving a second signal; generating second speech recognition results from the second signal using and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech; determining a second confidence value associated with the second speech recognition results; determining that the first speech recognition results were generated outside a window; and generating a second updated acoustic model based at least in part on the second confidence value. 8. The computer-implemented method of claim 7 , further comprising determining that the first signal comprises the first speech by: obtaining an endpointed signal, wherein the endpointed signal comprises one or more sounds and wherein the endpointed signal corresponds to a portion of the first signal; and using a classifier to determine that the one or more sounds comprise the speech. 9. The computer-implemented method of claim 8 , wherein receiving a first signal comprises: monitoring a signal output by a microphone; and obtaining the first signal, wherein the first signal comprises one or more sounds; wherein the first signal corresponds to a portion of the signal output by the microphone. 10. The computer-implemented method of claim 8 further comprising: identifying a speaker of the first speech; and transmitting information about the speaker to a server computing device. 11. The computer-implemented method of claim 8 further comprising: identifying a characteristic of a speaker of the first speech; and transmitting information about the characteristic to a server computing device. 12. The computer-implemented method of claim 11 , wherein the characteristic is a gender of the speaker. 13. The computer-implemented method of claim 11 , wherein the characteristic is information about a household of the speaker. 14. The computer-implemented method of claim 9 , wherein monitoring a signal output by a microphone comprises monitoring a signal output by a microphone for a time period exceeding ten minutes. 15. The computer-implemented method of claim 14 , wherein the first speech is not directed to the computing device. 16. A system comprising: an electronic data store configured to store a plurality of acoustic models; and one or more computer processors implemented in hardware and in communication with the electronic data store, the one or more computer processors configured to at least: receive a first signal comprising first speech; generate first speech recognition results using the first signal and an acoustic model of the plurality of acoustic models, wherein the first speech recognition results comprise a textual transcription of the first speech; and subsequent to generating the first speech recognition results: determine a first confidence value associated with the first speech recognition results; generate a first updated acoustic model based at least in part on the first confidence value associated with the first speech recognition results; generate second speech recognition results using a second signal and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech in the second signal; determine a second confidence value associated

Assignees

Inventors

Classifications

G10L15/07Primary
to the speaker · CPC title
G10L15/00Primary
Speech recognition (G10L17/00 takes precedence) · CPC title
G10L15/14
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

View patent family 56411271

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9401140B1 cover?: An unsupervised acoustic modeling service for speech recognition is disclosed. A computing device may be present in a listening zone, such as a household or office, and may receive an audio signal that may include speech. In some instances, the speech is not directed to the computing device. Speech recognition results may be generated from the audio signal, using an acoustic model, and the resu…
Who is the assignee on this patent?: Weber Frederick V, O'Neill Jeffrey C, Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/07. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).