Speech recognition method and apparatus

US11100916B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11100916-B2
Application numberUS-201916398482-A
CountryUS
Kind codeB2
Filing dateApr 30, 2019
Priority dateNov 21, 2018
Publication dateAug 24, 2021
Grant dateAug 24, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech recognition method and apparatus are disclosed. The speech recognition method includes determining a first score of candidate texts based on an input speech, determining a weight for an output of a language model based on the input speech, applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score, selecting a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text, and determining the target candidate text to correspond to a portion of the input speech.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition method comprising: determining a first score of candidate texts based on an input speech, wherein the first score of the candidate texts is an output from a neural network-based decoder; determining a weight for an output of a language model based on the input speech and context information associated with the input speech; applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score; selecting a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text by determining a candidate text having a greatest sum of the first score and the weighted second score as the target candidate text from among the candidate texts; and determining the target candidate text to correspond to a portion of the input speech. 2. The speech recognition method of claim 1 , wherein the determining of the weight comprises: determining a weight to be applied to an output of the language model at a current time based on the input speech and a target text determined at a previous time. 3. The speech recognition method of claim 2 , wherein the target text determined at the previous time comprises any one or any combination of target texts determined from a time at which speech recognition is initiated to a time immediately before the current time. 4. The speech recognition method of claim 1 , wherein the context information comprises any one or any combination of information on a user inputting the input speech, time information, location information, language information, speech recognition history information, and information on a currently operating application program. 5. The speech recognition method of claim 1 , wherein the determining of the weight comprises: determining a weight to be applied to an output of the language model at a current time based on the input speech, a target text determined at a previous time, and the context information. 6. The speech recognition method of claim 1 , wherein the determining of the weight comprises: extracting a feature value from the input speech; and providing the feature value to a neural network-based weight determiner to determine the weight. 7. The speech recognition method of claim 1 , wherein the determining of the first score comprises: extracting a feature value from the input speech using a neural network-based encoder; and determining a first score of each of the candidate texts from the extracted feature value using the neural network-based decoder. 8. The speech recognition method of claim 1 , wherein the language model comprises a plurality of language models, wherein the determining of the weight comprises: determining a weight to be applied to an output of each of the plurality of language models. 9. The speech recognition method of claim 8 , wherein the plurality of language models comprise a first language model and a second language model, wherein the first language model is configured to output a second score of the candidate texts, and the second language model is configured to output a third score of the candidate texts, wherein the determining of the weight comprises: determining a first weight to be applied to the second score and a second weight to be applied to the third score, and the selecting of the target candidate text comprises: selecting the target candidate text based on the first score, the second score to which the first weight is applied, and the third score to which the second weight is applied. 10. The speech recognition method of claim 1 , wherein the language model comprises a plurality of language models, and the determining of the weight comprises: selecting at least one language model from among the plurality of language models; and determining a weight to be applied to an output of the selected at least one language model. 11. The speech recognition method of claim 1 , wherein the language model is configured to output a second score corresponding to each of the candidate texts to determine, based on a target text determined at a previous time, a next target text subsequent to the target text determined at the previous time. 12. The speech recognition method of claim 1 , wherein each of the candidate texts is one of a word, a subword, a phrase, or a sentence. 13. The speech recognition method of claim 1 , wherein the determining of the first score comprises: determining the first score based on the input speech and an output of the neural network-based decoder at a previous time period. 14. The speech recognition method of claim 1 , wherein the language model is configured for a syntax of a type of device. 15. The speech recognition method of claim 1 , wherein the language model comprises a plurality of language models and a weight for each of the language models is dynamically adjusted based on a type of the input speech. 16. The speech recognition method of claim 15 , wherein the type of input speech comprises any one or any combination of a context of the input speech, an environment in which the speech recognition is performed, a type of a word in the input speech, type of device in which the speech recognition is performed, and a type of an utterance for which the speech recognition is performed. 17. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the speech recognition method of claim 1 . 18. A speech recognition apparatus comprising: a voice interface configured to receive an input speech from a user; and a processor configured to: determine a first score of candidate texts based on the input speech, wherein the first score of the candidate texts is an output from a neural network-based decoder; determine a weight for an output of a language model based on the input speech and context information associated with the input speech; apply the weight to a second score of the candidate texts output from the language model to obtain a weighted second score; select a target candidate text from among the candidate texts based on the first score and the weighted second score corresponding to the target candidate text by determining a candidate text having a greatest sum of the first score and the weighted second score as the target candidate text from among the candidate texts; and recognize the target candidate text to correspond to a portion of the input speech. 19. The speech recognition apparatus of claim 18 , wherein the processor is further configured to: extract a feature value from the input speech; and determine the weight using a neural network-based weight determiner configured to output a weight corresponding to the extracted feature value. 20. The speech recognition apparatus of claim 18 , wherein the language model comprises a first language model configured to output a second score of the candidate texts and a second language model configured to output a third score of the candidate texts; and the processor is further configured to: determine a first weight to be applied to the second score and a second weight to be applied to the third score, and select the target candidate text based on the first score, the second score to which the first weight is applied, and the third score to which the second weight is applied. 21. The speech recognition apparatus of claim 18 , wherein the languag

Assignees

Inventors

Classifications

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • G10L25/30Primary

    using neural networks · CPC title

  • Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice (G10L15/14 takes precedence) · CPC title

  • of application context · CPC title

  • Language recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11100916B2 cover?
A speech recognition method and apparatus are disclosed. The speech recognition method includes determining a first score of candidate texts based on an input speech, determining a weight for an output of a language model based on the input speech, applying the weight to a second score of the candidate texts output from the language model to obtain a weighted second score, selecting a target ca…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).