Receiving at a device audible input that is spelled
US-2015370530-A1 · Dec 24, 2015 · US
US9431006B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9431006-B2 |
| Application number | US-49751109-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 2, 2009 |
| Priority date | Jul 2, 2009 |
| Publication date | Aug 30, 2016 |
| Grant date | Aug 30, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Exemplary embodiments of methods and apparatuses for automatic speech recognition are described. First model parameters associated with a first representation of an input signal are generated. The first representation of the input signal is a discrete parameter representation. Second model parameters associated with a second representation of the input signal are generated. The second representation of the input signal includes a continuous parameter representation of residuals of the input signal. The first representation of the input signal includes discrete parameters representing first portions of the input signal. The second representation includes discrete parameters representing second portions of the input signal that are smaller than the first portions. Third model parameters are generated to couple the first representation of the input signal with the second representation of the input signal. The first representation and the second representation of the input signal are mapped into a vector space.
Opening claim text (preview).
What is claimed is: 1. A machine implemented method to perform speech recognition, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal having a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 2. A machine implemented method as in claim 1 , further comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 3. A machine implemented method as in claim 1 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 4. A machine implemented method as in claim 1 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 5. A machine implemented method as in claim 1 , further comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 6. A machine implemented method as in claim 1 , further comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 7. A machine implemented method as in claim 1 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 8. A machine implemented method as in claim 1 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 9. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 10. A machine implemented method as in claim 1 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 11. A non-transitory machine-readable medium storing executable program instructions which when executed by a data processing system causes the system to perform operations to recognize speech, comprising: receiving first portions of an acoustic signal; determining a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determining a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determining a likelihood of a recovered word sequence based on the recovered second parameter sequence; and outputting the recovered word sequence. 12. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 13. A non-transitory machine-readable medium as in claim 11 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 14. A non-transitory machine-readable medium as in claim 11 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic signal. 15. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising representing the first portions of the acoustic signal by cluster labels, wherein a cluster label is associated with a set of the first portions; computing residuals of the first portions based on the cluster labels; and representing the residuals of the first portions by one or more continuous parameters. 16. A non-transitory machine-readable medium as in claim 11 , further comprising instructions that cause the system to perform operations comprising determining a likelihood of a continuous parameter representation of the acoustic signal based on the recovered first parameter sequence. 17. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered first parameter sequence is determined based on a first distortion model. 18. A non-transitory machine-readable medium as in claim 11 , wherein the likelihood of the recovered second parameter sequence is determined based on a second distortion model. 19. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered first parameter sequence includes matching the recovered first parameter sequence with a first parameter sequence derived from training data; and selecting the recovered first parameter sequence based on the matching. 20. A non-transitory machine-readable medium as in claim 11 , wherein the determining the likelihood of the recovered second parameter sequence includes mapping the recovered first parameter sequence to the recovered second parameter sequence. 21. A data processing system to perform speech recognition, comprising: a memory; and a processor coupled to the memory, the processor is configured to: receive first portions of an acoustic signal; determine a likelihood of a recovered first parameter sequence representing the first portions of the acoustic signal; determine a likelihood of a recovered second parameter sequence associated with the recovered first parameter sequence, wherein the second parameter sequence represents second portions of the acoustic signal that have a coarser granularity than the first portions; determine a likelihood of a recovered word sequence based on the recovered second parameter sequence; and output the recovered word sequence. 22. A data processing system as in claim 21 , wherein the processor is further configured to determine a joint likelihood of the recovered first parameter sequence and the recovered second parameter sequence. 23. A data processing system as in claim 21 , wherein the first portions of the acoustic signal are associated with frames, and the second portions are associated with phonemes. 24. A data processing system as in claim 21 , wherein the recovered second parameter sequence includes a discrete parameter representation of the acoustic signal; and the recovered first parameter sequence includes a continuous parameter representation of residuals of the acoustic si
Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title
Training · CPC title
Speech classification or search · CPC title
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.