System for processing user utterance and control method thereof
US-2021151052-A1 · May 20, 2021 · US
US11735179B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11735179-B2 |
| Application number | US-202017031476-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 24, 2020 |
| Priority date | Feb 13, 2020 |
| Publication date | Aug 22, 2023 |
| Grant date | Aug 22, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure discloses a speech chip and an electronic device. The speech chip includes a first processing module, a second processing module and a third processing module. The first processing module is configured to run an operating system, and to perform data scheduling on modules other than the first processing module in the chip. The second processing module is configured to perform a mutual conversion between speech and text based on a speech model. The third processing module is configured to perform digital signal processing on inputted speech.
Opening claim text (preview).
What is claimed is: 1. A speech chip, comprising: a first processing module, a second processing module, and a third processing module; wherein, the first processing module is configured to run an operating system, and to perform data scheduling on modules other than the first processing module in the chip; the second processing module is configured to perform a mutual conversion between speech and text based on a speech model; and the third processing module is configured to perform digital signal processing on inputted speech. 2. The chip of claim 1 , wherein the second processing module comprises a processor and an internal memory; wherein the processor is configured to perform the mutual conversion between speech and text based on the speech model; and the internal memory is connected to the processor and is configured to store data generated during an execution of the speech model. 3. The chip of claim 2 , wherein, the second processing module is configured to set configuration information of the speech model based on model data to initialize the speech model; wherein the model data is obtained by the first processing module from an external storage device through a peripheral interface. 4. The chip of claim 3 , further comprising a storage module connected to the second processing module and configured to store the model data. 5. The chip of claim 2 , wherein the processor is an embedded neural network processor. 6. The chip of claim 1 , wherein, the third processing module is configured to perform the digital signal processing on the inputted speech to obtain a speech signal or speech feature data, and to send the speech signal or the speech feature data obtained to the second processing module; the second processing module is configured to recognize the inputted speech based on the speech model; and the first processing module is configured to obtain a response result from an external storage device through a peripheral interface based on a recognition result, and to feed the response result back to a user. 7. The chip of claim 1 , comprising a power supply module, wherein the power supply module comprises: a speech detection unit and a power management unit; wherein the speech detection unit is configured to detect speech from a user in real time; and the power management unit is configured to, in response to detecting the speech from the user, supply power to the third processing module, such that the third processing module performs wake-up word detection on the speech from the user; and in response to the speech from the user comprising a wake-up word, supply power to modules other than the power supply module and the third processing module. 8. The chip of claim 1 , further comprising an image processing module, configured to process an image collected to broadcast and/or display text information in the image to the user. 9. The chip of claim 8 , wherein the image processing module comprises: an image obtaining unit, an image processing unit and an image display unit; wherein the image obtaining unit is configured to obtain the image; the image processing unit is configured to perform text recognition on the image, and the first processing module controls the second processing module to perform speech conversion on a text recognized and broadcasts speech converted to a user through an external device; and the image display unit is configured to display the image and/or the text recognized. 10. The chip of claim 1 , wherein the first processing module comprises a multi-core central processing unit; and the third processing module comprises a digital signal processor. 11. An electronic device, comprising an audio interface and a speech chip, wherein, the audio interface is configured to receive inputted speech; the speech chip comprises: a first processing module, a second processing module, and a third processing module; the first processing module is configured to run an operating system, and to perform data scheduling on modules other than the first processing module in the chip; the second processing module is configured to perform a mutual conversion between speech and text based on a speech model; and the third processing module is configured to perform digital signal processing on the inputted speech. 12. The device of claim 11 , wherein the second processing module comprises a processor and an internal memory; wherein the processor is configured to perform the mutual conversion between speech and text based on the speech model; and the internal memory is connected to the processor and is configured to store data generated during an execution of the speech model. 13. The device of claim 12 , wherein, the second processing module is configured to set configuration information of the speech model based on model data to initialize the speech model; wherein the model data is obtained by the first processing module from an external storage device through a peripheral interface. 14. The device of claim 13 , further comprising a storage module, connected to the second processing module and configured to store the model data. 15. The device of claim 12 , wherein the processor is an embedded neural network processor. 16. The chip of claim 11 , wherein, the third processing module is configured to perform the digital signal processing on the inputted speech to obtain a speech signal or speech feature data and to send the speech signal or the speech feature data obtained to the second processing module; the second processing module is configured to recognize the inputted speech based on the speech model; and the first processing module is configured to obtain a response result from an external storage device through a peripheral interface based on a recognition result, and to feed the response result back to a user. 17. The device of claim 11 , further comprising a power supply module, wherein the power supply module comprises a speech detection unit and a power management unit; wherein the speech detection unit is configured to detect speech from a user in real time; and the power management unit is configured to, in response to detecting the speech from the user, supply power to the third processing module, such that the third processing module performs wake-up word detection on the speech from the user; and in response to the speech from the user comprising a wake-up word, supply power to modules other than the power supply module and the third processing module. 18. The device of claim 11 , wherein the speech chip further comprises an image processing module, configured to process an image collected to broadcast and/or display text information in the image to the user. 19. The device of claim 18 , wherein the image processing module comprises: an image obtaining unit, an image processing unit and an image display unit; wherein the image obtaining unit is configured to obtain the image; the image processing unit is configured to perform text recognition on the image, and the first processing module controls the second processing module to perform speech conversion on a text recognized and broadcasts speech converted to a user through an external device; and the image display unit is configured to display the image and/or the text recognized. 20. The device of claim 11 , wherein the first processing module comprises a multi-core central processing unit; and the third processing module comprises a digital signal processor.
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech synthesis; Text to speech systems · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Methods for producing synthetic speech; Speech synthesisers · CPC title
Constructional details of speech recognition systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.