Methods and apparatus for hybrid speech recognition processing
US-2018197545-A1 · Jul 12, 2018 · US
US10614811B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10614811-B2 |
| Application number | US-201715858763-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 29, 2017 |
| Priority date | Dec 29, 2017 |
| Publication date | Apr 7, 2020 |
| Grant date | Apr 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system, method, apparatus and computer readable medium for hierarchical speech recognition resolution. The method of hierarchical speech recognition resolution on a platform includes receiving a speech stream from a microphone. The speech stream is resolved using a lowest possible level automatic speech recognition (ASR) engine of multi-level ASR engines. The selection of the lowest possible level ASR engine is based on policies defined for the platform. If resolution of the speech stream is rated less than a predetermined confidence level, the resolution of the speech stream is pushed to a next higher-level ASR engine of the multi-level ASR engines until the resolution of the speech stream meets the predetermined confidence level without violating one or more policies.
Opening claim text (preview).
What is claimed is: 1. A platform having hierarchical speech resolution, comprising: network interface circuitry to receive a speech stream from a microphone; a processor coupled to the network interface circuitry; one or more memory devices coupled to the processor, the one or more memory devices including instructions, which when executed by the processor, cause the platform to: resolve the speech stream using a lowest possible level automatic speech recognition (ASR) engine of multi-level ASR engines, wherein the multi-level ASR engines include a voice trigger based ASR engine for limited keywords and a vocabulary set of 30-40 words, an audio digital signal processor (DSP) based ASR engine having a vocabulary of a few hundred words, a local processor based ASR engine having a large vocabulary, and a cloud based ASR engine having a very large vocabulary with unlimited processing and memory and wherein selection of the lowest possible level ASR engine is based on policies defined for the platform; and when resolution of speech is less than a predetermined confidence rating, push resolution of the speech stream to a next higher-level ASR engine of the multi-level ASR engines until the resolution of the speech stream meets the predetermined confidence rating without violating one or more policies. 2. The platform of claim 1 , wherein the multi-level ASR engines comprise a hierarchical structure to provide more compute power and word recognition with each higher-level ASR engine. 3. The platform of claim 2 , wherein the hierarchical structure of the multi-level ASR engines comprises additional processing power and a larger vocabulary for each higher-level ASR engine of the multi-level ASR engines. 4. The platform of claim 1 , wherein the confidence rating indicates how well an ASR engine resolved the speech stream, wherein if accuracy of the resolved speech stream is below a predefined level, the instructions, when executed, are to push resolution of the speech stream to the next higher-level ASR engine, wherein the next higher-level ASR engine includes more compute power and a larger vocabulary subsystem. 5. The platform of claim 4 , wherein if the accuracy of the resolved speech stream is equal to or exceeds the predefined level, the instructions, when executed, are to accept the resolution of the speech stream without pushing resolution of the speech stream to the next higher-level ASR engine. 6. The platform of claim 1 , wherein the one or more policies include the confidence rating, a privacy setting, user identity, system connection states, time of day, response time, and other indicators requiring resolution of the speech stream at lower-level or higher-level ASR engines. 7. The platform of claim 6 , wherein the privacy setting prevents specific speech from going to a cloud based ASR engine, wherein all data remains local to the platform. 8. An apparatus having hierarchical speech resolution on a platform comprising: one or more substrates; and logic coupled to the one or more substrates, wherein the logic includes one or more of configurable logic or fixed-functionality hardware logic, the logic coupled to the one or more substrates to: receive a speech stream from a microphone; resolve the speech stream using a lowest possible level automatic speech recognition (ASR) engine of multi-level ASR engines, wherein the multi-level ASR engines include a voice trigger based ASR engine for limited keywords and a vocabulary set of 30-40 words, an audio digital signal processor (DSP) based ASR engine having a vocabulary of a few hundred words, a local processor based ASR engine having a large vocabulary, and a cloud based ASR engine having a very large vocabulary with unlimited processing and memory and wherein selection of the lowest possible level ASR engine is based on policies defined for the platform; and when resolution of speech is less than a predetermined confidence rating, push resolution of the speech stream to a next higher-level ASR engine of the multi-level ASR engines until the resolution of the speech stream meets the predetermined confidence rating without violating one or more policies. 9. The apparatus of claim 8 , wherein the multi-level ASR engines comprise a hierarchical structure to provide more compute power and word recognition with each higher-level ASR engine. 10. The apparatus of claim 9 , wherein the hierarchical structure of the multi-level ASR engines comprises additional processing power and a larger vocabulary for each higher-level ASR engine of the multi-level ASR engines. 11. The apparatus of claim 8 , wherein the one or more policies include the confidence rating, a privacy setting, user identity, system connection states, time of day, response time, and other indicators requiring resolution of the speech stream at lower-level or higher-level ASR engines. 12. The apparatus of claim 11 , wherein the privacy setting prevents specific speech from going to a cloud based ASR engine, wherein all data remains local to the platform. 13. A method of hierarchical speech resolution on a platform comprising: receiving a speech stream from a microphone; resolving the speech stream using a lowest possible level automatic speech recognition (ASR) engine of multi-level ASR engines, wherein the multi-level ASR engines include a voice trigger based ASR engine for limited keywords and a vocabulary set of 30-40 words, an audio digital signal processor (DSP) based ASR engine having a vocabulary of a few hundred words, a local processor based ASR engine having a large vocabulary, and a cloud based ASR engine having a very large vocabulary with unlimited processing and memory and wherein selection of the lowest possible level ASR engine is based on policies defined for the platform; and when resolution of speech is less than a predetermined confidence rating, pushing resolution of the speech stream to a next higher-level ASR engine of the multi-level ASR engines until the resolution of the speech stream meets the predetermined confidence rating without violating one or more policies. 14. The method of claim 13 , wherein the multi-level ASR engines comprise a hierarchical structure to provide more compute power and word recognition with each higher-level ASR engine. 15. The method of claim 14 , wherein the hierarchical structure of the multi-level ASR engines comprises additional processing power and a larger vocabulary for each higher-level ASR engine of the multi-level ASR engines. 16. The method of claim 13 wherein the one or more policies include the confidence rating, a privacy setting, user identity, system connection states, time of day, response time, and other indicators requiring resolution of the speech stream at lower-level or higher-level ASR engines. 17. The method of claim 16 , wherein the privacy setting prevents specific speech from going to a cloud based ASR engine, wherein all data remains local to the platform. 18. At least one non-transitory computer readable medium, comprising a set of instructions, which when executed by a computing device, cause the computing device to: receive a speech stream from a microphone; resolve the speech stream using a lowest possible level automatic speech recognition (ASR) engine of multi-level ASR engines, wherein the multi-level ASR engines include a voice trigger based ASR engine for limited keywords and a vocabulary set of 30-40 words, an audio digital signal processor (DSP) based ASR engine having a vocabulary of a few hundred words, a local processor based ASR engine having a large vocabu
of application context · CPC title
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
Word spotting · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.