Speech recognition method and apparatus

US10930268B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10930268-B2
Application numberUS-201916244397-A
CountryUS
Kind codeB2
Filing dateJan 10, 2019
Priority dateMay 31, 2018
Publication dateFeb 23, 2021
Grant dateFeb 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a speech recognition method and apparatus, wherein the apparatus acquires first outputs from sub-models in a recognition model based on a speech signal, acquires a second output including values corresponding to the sub-models from a classification model based on the speech signal, and recognizes the speech signal based on the first outputs and the second output.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method comprising: acquiring first outputs from sub-models in a recognition model based on a speech signal; acquiring, based on the speech signal, a second output comprising respective probabilities that the speech signal belongs to each of the sub-models from a classification model; and recognizing the speech signal based on a weighted sum of the first outputs and the second output, wherein the sub-models comprise models for estimating pronunciations classified into groups based on a similarity of pronunciation for each of the groups, and wherein the classification model is configured to estimate a probability or weight associated with each of the sub-models and nodes in an output layer of the classification model correspond to each of the sub-models. 2. The speech recognition method of claim 1 , wherein the groups are classified based on any one or any combination of dialects, regions, and races in a language. 3. The speech recognition method of claim 1 , wherein the sub-models are models for estimating pronunciations for each of the users. 4. The speech recognition method of claim 1 , wherein the recognition model comprises a neural network for estimating a pronunciation of the speech signal, and the sub-models each include learning hidden unit contributions (LHUCs) or layers trained independently of one another in the neural network. 5. The speech recognition method of claim 4 , wherein the sub-models share at least one layer in the neural network. 6. The speech recognition method of claim 5 , wherein a feature acquired from a layer shared by the sub-models is applied to the trained layers. 7. The speech recognition method of claim 1 , wherein the acquiring of the second output comprises: acquiring a feature from a layer in the recognition model; and acquiring the second output by applying the acquired feature to the classification model. 8. The speech recognition method of claim 1 , wherein the acquiring of the second output comprises: generating a feature suitable for an input layer of the classification model based on the speech signal; and acquiring the second output by applying the generated feature to the classification model. 9. The speech recognition method of claim 1 , wherein the recognizing of the speech signal comprises: generating a third output based on the weighted sum between the first outputs and the respective probabilities included in the second output; and estimating a pronunciation of the speech signal based on the third output. 10. The speech recognition method of claim 1 , wherein the acquiring of the second output comprises: applying a bias to the probabilities included in the second output based on a context associated with the speech signal, and the context comprises any one or any combination of a location and a language of a keyboard of a device to which the speech signal is applied. 11. The speech recognition method of claim 1 , wherein an acoustic model comprising the recognition model and the classification model are connected to a language model on an end-to-end basis, and the recognizing of the speech signal comprises: recognizing a word or a sentence of the speech signal based on the first outputs and the second output. 12. The speech recognition method of claim 1 , wherein the respective probabilities comprise weights corresponding to each of the sub-models. 13. The speech recognition method of claim 1 , wherein the recognition model is trained to recognize a language of the users, and the sub-models are trained to recognize languages corresponding to groups of the users. 14. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 1 . 15. A speech recognition method comprising: generating an input feature of a recognition model including sub-models based on a speech signal; acquiring probabilities that the speech signal belongs to the respective sub-models from a classification model based on the speech signal; generating a second input feature by applying the probabilities to the input feature; and recognizing the speech signal by applying the second input feature to the recognition model, wherein the sub-models comprise models for estimating pronunciations classified into groups based on a similarity of pronunciation for each of the groups, and wherein the classification model is configured to estimate a probability or weight associated with each of the sub-models and nodes in an output layer of the classification model correspond to each of the sub-models. 16. The speech recognition method of claim 15 , wherein an input layer of the recognition model comprises nodes corresponding to the probabilities. 17. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method of claim 15 . 18. A speech recognition apparatus comprising: a processor configured to acquire first outputs from sub-models in a recognition model based on a speech signal, to acquire, based on the speech signal, a second output comprising respective probabilities that the speech signal belongs to each of the sub-models from a classification model, and to recognize the speech signal based on a weighted sum of the first outputs and the second output, wherein the sub-models comprise models for estimating pronunciations classified into groups based on a similarity of pronunciation for each of the groups, and wherein the classification model is configured to estimate a probability or weight associated with each of the sub-models and nodes in an output layer of the classification model correspond to each of the sub-models. 19. A training apparatus for speech recognition, the apparatus comprising: a processor configured to train a recognition model comprising sub-models based on first training speech signals, to train sub-models based on second training speech signals corresponding to the sub-models, and to train a classification model that generates outputs corresponding to the sub-models based on the second training speech signals. 20. A speech recognition method comprising: generating a first input feature for a recognition model comprising sub-models based on a speech signal; acquiring respective probabilities of the speech signal belonging to each of the sub-models from a classification model based on the speech signal; generating a second input feature for the recognition model based on applying the probabilities to the first input feature; and recognizing the speech signal based on an output generated, in response to the second input feature being applied to the classification model, wherein the sub-models comprise models for estimating pronunciations classified into groups based on a similarity of pronunciation for each of the groups, and wherein the classification model is configured to estimate a probability or weight associated with each of the sub-models and nodes in an output layer of the classification model correspond to each of the sub-models.

Assignees

Inventors

Classifications

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10930268B2 cover?
Disclosed is a speech recognition method and apparatus, wherein the apparatus acquires first outputs from sub-models in a recognition model based on a speech signal, acquires a second output including values corresponding to the sub-models from a classification model based on the speech signal, and recognizes the speech signal based on the first outputs and the second output.
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).