Electronic device and speech recognition method therefor
US-2018182386-A1 · Jun 28, 2018 · US
US12249336B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12249336-B2 |
| Application number | US-202118573846-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 29, 2021 |
| Priority date | Jun 29, 2021 |
| Publication date | Mar 11, 2025 |
| Grant date | Mar 11, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are provided for building a configurable multilingual model. A computing system obtains a plurality of language-specific automatic speech recognition modules and a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages. The computing system then compiles the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules to generate a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content.
Opening claim text (preview).
What is claimed is: 1. A computing system comprising: one or more processors; and one or more hardware storage devices storing one or more computer-readable instructions that are executable by the one or more processors to configure the computing system to at least: obtain a plurality of language-specific automatic speech recognition modules, each language-specific automatic speech recognition module of the plurality of language-specific automatic speech recognition modules having been trained on a different language-specific training dataset and such that each of the plurality of language-specific automatic speech recognition modules is configured to recognize speech in a correspondingly different language of a plurality of different languages; obtain a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the plurality of different languages and such that the universal automatic speech recognition module is trained to recognize speech in all of the plurality of different languages; compile the universal automatic speech recognition module with the plurality of language-specific automatic speech recognition modules as a configurable multilingual model that is configured to selectively and dynamically utilize a sub-set of the plurality of language-specific automatic speech recognition modules with the universal automatic speech recognition module to process audio content in response to user input identifying one or more target languages associated with the audio content; and training the configurable multilingual model to recognize user input for selecting combinations of the plurality of different languages when configuring the configurable multilingual model into a user-specific automatic speech recognition model by providing the configurable multilingual model with user choice input vectors corresponding to different combinations of the plurality of different languages. 2. The computing system of claim 1 , the one or more computer-readable instructions being further executable to further configure the computing system to: obtain a one-hot vector corresponding to a first language; obtain a multi-hot vector corresponding the first language and one or more additional languages; and randomly present the one-hot vector and the multi-hot vector as the user choice input vectors to the configurable multilingual model during training of the configurable multilingual model. 3. The computing system of claim 2 , the one or more computer-readable instructions being further executable to further configure the computing system to: apply a language-independent training dataset without language identification data. 4. The computing system of claim 2 , as a result of compiling the configurable multilingual model, the configurable multilingual model comprises a language-specific embedding based on the multi-hot vector and an input acoustic feature, a language-specific layer comprising the universal automatic speech recognition module and the plurality of language-specific automatic speech recognition modules, and a language-specific vocabulary that merges one or more language-specific vocabularies in response to user input interpretable for selecting one or more languages, each language corresponding to a different language-specific vocabulary dataset. 5. A computing system comprising: one or more processors; and one or more hardware storage devices storing one or more computer-readable instructions that are executable by the one or more processors to configure the computing system to at least: obtain a configurable multilingual model comprising a universal automatic speech recognition module and a plurality of language-specific automatic speech recognition modules, the configurable multilingual model being trained to dynamically select the universal automatic speech recognition module and a sub-set of language-specific automatic speech recognition modules from the plurality of language-specific automatic speech recognition modules to generate a user-specific automatic speech recognition model configured to recognize spoken utterances in one or more user-identified languages; receive user input comprising (i) a null value corresponding to the universal automatic speech recognition module or (ii) a language identification vector indicating one or more target languages; select the universal automatic speech recognition module; and when the user input comprises the language identification vector, select the sub-set of language-specific automatic speech recognition modules, each language-specific automatic speech recognition modules included in the sub-set of language-specific automatic speech recognition modules trained to recognize spoken utterances in a different language of the one or more target languages. 6. The computing system of claim 5 , the one or more computer-readable instructions being further executable to further configure the computing system to: extract the universal automatic speech recognition module and the sub-set of language-specific automatic speech recognition modules from the configurable multilingual model; and at inference time, generate the user-specific automatic speech recognition model by combining the universal automatic speech recognition module and the sub-set of language-specific automatic speech recognition modules. 7. The computing system of claim 6 , the one or more computer-readable instructions being further executable to further configure the computing system to: transmit the user-specific automatic speech recognition model to a user device. 8. The computing system of claim 5 , the one or more computer-readable instructions being further executable to further configure the computing system to compile the configurable multilingual model by: identifying one or more module languages; obtaining one or more language-specific automatic speech recognition modules, each language-specific automatic speech recognition module of the one or more language-specific automatic speech recognition modules trained on a different language-specific training dataset to train each language-specific automatic speech recognition module to recognize spoken utterances in a different language of the one or more module languages; obtaining a universal automatic speech recognition module trained on a multi-language training dataset comprising training data corresponding to each of the one or more module languages to train the universal automatic speech recognition module to recognize spoken utterances in any of the one or more module languages; and combining the universal automatic speech recognition module and the one or more language-specific automatic speech recognition modules. 9. The computing system of claim 5 , the language identification vector comprising a one-hot vector corresponding a single target language. 10. The computing system of claim 5 , the language identification vector comprising a multi-hot vector corresponding to a plurality of target languages. 11. The computing system of claim 5 , the one or more computer-readable instructions being further executable to further configure the computing system to select the sub-set of language-specific automatic speech recognition modules by: positively weighting each language-specific automatic speech recognition module included in the sub-set of language-specific automatic speech recognition modules; and unweighting each language-specific automatic speech recognition module not included in the sub-set of language-specific automatic speech recognition modules. 12. The computing system of claim 5 , the one or more
updating or merging of old and new templates; Mean values; Weighting · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Training · CPC title
Language recognition · CPC title
using artificial neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.