Methods and apparatuses for automatic speech recognition

US9431006B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9431006-B2
Application numberUS-49751109-A
CountryUS
Kind codeB2
Filing dateJul 2, 2009
Priority dateJul 2, 2009
Publication dateAug 30, 2016
Grant dateAug 30, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.

First claim

Opening claim text (preview).

What is claimed is: 1. A machine implemented method to perform speech recognition, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 2. A machine implemented method as in claim 1 , further comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 3. A machine implemented method as in claim 1 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 4. A machine implemented method as in claim 1 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 5. A machine implemented method as in claim 1 , further comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 6. A machine implemented method as in claim 1 , further comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 7. A machine implemented method as in claim 1 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 8. A machine implemented method as in claim 1 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 9. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 10. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 11. A non-transitory machine-readable medium storing executable program instructions which when executed by a data processing system causes the system to perform operations to recognize speech, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 12. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 13. A non-transitory machine-readable medium as in claim 11 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 14. A non-transitory machine-readable medium as in claim 11 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 15. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 16. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 17. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 18. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 19. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 20. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 21. A data processing system to perform speech recognition, comprising: a memory; and a processor coupled to the memory, the processor is configured to: receive first portions of an acoustic signal; determine a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determine a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determine a likelihood of a recovered word sequence based on the recovered second parameter sequence; and output the recovered word sequence. 22. A data processing system as in claim 21 , wherein the processor is further configured to determine a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 23. A data processing system as in claim 21 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 24. A data processing system as in claim 21 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic si

Assignees

Inventors

Classifications

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • G10L15/063Primary

    Training · CPC title

  • G10L15/08Primary

    Speech classification or search · CPC title

  • Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9431006B2 cover?
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second represent…
Who is the assignee on this patent?
Bellegarda Jerome R, Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 30 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).