What technology area does this patent fall under?

Primary CPC classification G10L15/187. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and apparatuses for automatic speech recognition

US9431006B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9431006-B2
Application number	US-49751109-A
Country	US
Kind code	B2
Filing date	Jul 2, 2009
Priority date	Jul 2, 2009
Publication date	Aug 30, 2016
Grant date	Aug 30, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.

First claim

Opening claim text (preview).

What is claimed is: 1. A machine implemented method to perform speech recognition, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 2. A machine implemented method as in claim 1 , further comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 3. A machine implemented method as in claim 1 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 4. A machine implemented method as in claim 1 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 5. A machine implemented method as in claim 1 , further comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 6. A machine implemented method as in claim 1 , further comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 7. A machine implemented method as in claim 1 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 8. A machine implemented method as in claim 1 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 9. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 10. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 11. A non-transitory machine-readable medium storing executable program instructions which when executed by a data processing system causes the system to perform operations to recognize speech, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 12. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 13. A non-transitory machine-readable medium as in claim 11 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 14. A non-transitory machine-readable medium as in claim 11 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 15. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 16. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 17. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 18. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 19. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 20. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 21. A data processing system to perform speech recognition, comprising: a memory; and a processor coupled to the memory, the processor is configured to: receive first portions of an acoustic signal; determine a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determine a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determine a likelihood of a recovered word sequence based on the recovered second parameter sequence; and output the recovered word sequence. 22. A data processing system as in claim 21 , wherein the processor is further configured to determine a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 23. A data processing system as in claim 21 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 24. A data processing system as in claim 21 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic si

Assignees

Inventors

Bellegarda Jerome R

Classifications

G10L15/187Primary
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
G10L15/063Primary
Training · CPC title
G10L15/08Primary
Speech classification or search · CPC title
G10L15/32
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
G10L15/14
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 43413129

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9431006B2 cover?: Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second represent…
Who is the assignee on this patent?: Bellegarda Jerome R, Apple Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).