Automatic synchronization for an offline virtual assistant
US-2024347055-A1 · Oct 17, 2024 · US
US9401140B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9401140-B1 |
| Application number | US-201213592157-A |
| Country | US |
| Kind code | B1 |
| Filing date | Aug 22, 2012 |
| Priority date | Aug 22, 2012 |
| Publication date | Jul 26, 2016 |
| Grant date | Jul 26, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An unsupervised acoustic modeling service for speech recognition is disclosed. A computing device may be present in a listening zone, such as a household or office, and may receive an audio signal that may include speech. In some instances, the speech is not directed to the computing device. Speech recognition results may be generated from the audio signal, using an acoustic model, and the results used to update the acoustic model. For example, the acoustic model may be updated to reflect a particular speaker's pronunciation of certain sound units, such as phonemes. By using speech that is not necessarily directed to the computing device, more data may be available for updating an acoustic model.
Opening claim text (preview).
What is claimed is: 1. A system for unsupervised acoustic modeling, the system comprising: an electronic data store configured to store a plurality of acoustic models; a remote computing device in communication with the electronic data store; and a local computing device in communication with the remote computing device over a network; wherein the system is configured to: obtain a first signal from a microphone, wherein the first signal comprises first speech; generate first speech recognition results using the first signal and an acoustic model, wherein the first speech recognition results comprise a first textual transcription of the first speech in the first signal; subsequent to generating the first speech recognition results: determine a first confidence value indicative of a confidence that the first speech recognition results generated using the acoustic model comprise an accurate transcription of the first speech, generate a first updated acoustic model based at least in part on the first confidence value; generate second speech recognition results using a second signal and the first updated acoustic model, wherein the second speech recognition results comprise a second textual transcription of the second speech in the second signal; and subsequent to generating the second speech recognition results: determine a second confidence value indicative of a confidence that the second speech recognition results generated using the first updated acoustic model comprise an accurate transcription of the second speech; determine that the first speech recognition results were generated outside a window; and generate a second updated acoustic model based at least in part on the second confidence value. 2. A computer-implemented method comprising: under control of a computing device comprising one or more computer processors implemented in hardware and configured to execute specific computer-executable instructions, receiving a first signal; generating first speech recognition results using the first signal and an acoustic model, wherein the first speech recognition results comprise a textual transcription of first speech; and subsequent to generating the first speech recognition results: determining a first confidence value associated with the first speech recognition results; generating a first updated acoustic model based on the first confidence value associated with the first speech recognition results; receiving a second signal; generating second speech recognition results using the second signal and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech; determining a second confidence value associated with the second speech recognition results; determining that the first speech recognition results were generated outside a window; and generating a second updated acoustic model based at least in part on the second confidence value. 3. The system of claim 1 , wherein the system is further configured to monitor microphone output for a period of time exceeding ten minutes. 4. The system of claim 3 , wherein the first speech is not directed to the local computing device. 5. The system of claim 1 , wherein the system is configured to determine that the first signal comprises the first speech using based at least in part on a classification algorithm. 6. The system of claim 1 : wherein the acoustic model comprises a Gaussian mixture model; and wherein a Gaussian distribution is included in the Gaussian mixture model. 7. A computer-implemented method comprising: under control of a computing device comprising one or more computer processors implemented in hardware and configured to execute specific computer-executable instructions, receiving a first signal; generating first speech recognition results from the first signal using and an acoustic model, wherein the first speech recognition results comprise a textual transcription of first speech; and subsequent to generating the first speech recognition results: determining a first confidence value associated with the first speech recognition results; generating a first updated acoustic model based on the first confidence value associated with the first speech recognition results; receiving a second signal; generating second speech recognition results from the second signal using and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech; determining a second confidence value associated with the second speech recognition results; determining that the first speech recognition results were generated outside a window; and generating a second updated acoustic model based at least in part on the second confidence value. 8. The computer-implemented method of claim 7 , further comprising determining that the first signal comprises the first speech by: obtaining an endpointed signal, wherein the endpointed signal comprises one or more sounds and wherein the endpointed signal corresponds to a portion of the first signal; and using a classifier to determine that the one or more sounds comprise the speech. 9. The computer-implemented method of claim 8 , wherein receiving a first signal comprises: monitoring a signal output by a microphone; and obtaining the first signal, wherein the first signal comprises one or more sounds; wherein the first signal corresponds to a portion of the signal output by the microphone. 10. The computer-implemented method of claim 8 further comprising: identifying a speaker of the first speech; and transmitting information about the speaker to a server computing device. 11. The computer-implemented method of claim 8 further comprising: identifying a characteristic of a speaker of the first speech; and transmitting information about the characteristic to a server computing device. 12. The computer-implemented method of claim 11 , wherein the characteristic is a gender of the speaker. 13. The computer-implemented method of claim 11 , wherein the characteristic is information about a household of the speaker. 14. The computer-implemented method of claim 9 , wherein monitoring a signal output by a microphone comprises monitoring a signal output by a microphone for a time period exceeding ten minutes. 15. The computer-implemented method of claim 14 , wherein the first speech is not directed to the computing device. 16. A system comprising: an electronic data store configured to store a plurality of acoustic models; and one or more computer processors implemented in hardware and in communication with the electronic data store, the one or more computer processors configured to at least: receive a first signal comprising first speech; generate first speech recognition results using the first signal and an acoustic model of the plurality of acoustic models, wherein the first speech recognition results comprise a textual transcription of the first speech; and subsequent to generating the first speech recognition results: determine a first confidence value associated with the first speech recognition results; generate a first updated acoustic model based at least in part on the first confidence value associated with the first speech recognition results; generate second speech recognition results using a second signal and the first updated acoustic model, wherein the second speech recognition results comprise a textual transcription of second speech in the second signal; determine a second confidence value associated
to the speaker · CPC title
Speech recognition (G10L17/00 takes precedence) · CPC title
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.