Artificial intelligence apparatus for recognizing speech of user using personalized language model and method for the same

US11302311B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11302311-B2
Application numberUS-201916546924-A
CountryUS
Kind codeB2
Filing dateAug 21, 2019
Priority dateJul 23, 2019
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An artificial intelligence apparatus for recognizing speech of a user includes a microphone, and a processor configured to receive, via the microphone, a sound signal corresponding to the speech of the user, acquire personalize identification information corresponding to the speech, recognize the speech from the sound signal using a global language model, calculate a reliability for the recognition, and if the calculated reliability exceeds a predetermined first reference value, update a personalized language model corresponding to the personalize identification information using the recognition result.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence apparatus for recognizing speech of a user, comprising: a microphone; and a processor configured to: receive, via the microphone, a sound signal corresponding to the speech of the user; acquire personalize identification information corresponding to the speech; recognize the speech from the sound signal using a global language model; determine a reliability for the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, update a personalized language model corresponding to the personalize identification information using the speech recognition result; determine a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extract a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information, and determine a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determine a word most similar to the misrecognized word in a lexicon, and correct the speech recognition results by replacing the misrecognized word in the speech recognition result with the most similar word. 2. The artificial intelligence apparatus according to claim 1 , wherein the personalized language model is a language model configured to generate a word vector from the personalize vector and phonemes recognized in an acoustic model, and wherein the processor is further configured to update the personalized language model corresponding to the personalize identification information by increasing a weight of a word included in the speech recognition result. 3. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: determine a second LM score for each word in the speech recognition result; and extract words for which the second LM score does not exceed a predetermined second reference value as the misrecognized word. 4. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: determine a third LM score with respect to the corrected recognition result; and based on the determined third LM score exceeding a predetermined third reference value, determine that the speech recognition result is successful, and update the personalized language model corresponding to the personalize identification information using the corrected recognition result. 5. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: based on the determined reliability exceeding the first reference value, determine an intention corresponding to the speech recognition result; and perform an operation corresponding to the determined intention. 6. The artificial intelligence apparatus according to claim 5 , wherein the processor is further configured to: project the speech recognition result in a vector space using an intention classifier; and determine the intention of the user by comparing a position of the projected speech recognition result with positions of a plurality of intention groups included in the vector space. 7. The artificial intelligence apparatus according to claim 6 , wherein the processor is further configured to determine the intention of the user as an intention corresponding to a nearest intention group nearest from the position of the projected speech recognition result among the plurality of intention groups. 8. The artificial intelligence apparatus according to claim 1 , wherein the personalize identification information includes at least one of user identification information for distinguishing each user or device identification information for distinguishing each device, the user identification information is information indicating a user identified according to voice analysis of the speech, and the device identification information is information indicating a device that has received the speech. 9. The artificial intelligence apparatus according to claim 1 , wherein the global language model and the personalized language model are a model learned using a machine learning algorithm or a deep learning algorithm, and configured as an artificial neural network. 10. A method for recognizing speech of a user, comprising: receiving a sound signal corresponding to speech of the user; acquiring personalize identification information corresponding to the speech; recognizing the speech from the sound signal using a global language model; determining reliability of the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, updating a personalized language model corresponding to the personalize identification information using the speech recognition result; determining a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extract a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information; and determine a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determine a word most similar to the misrecognized word in a lexicon, and correct the speech recognition result by replacing the misrecognized word in the speech recognition result with the most similar word. 11. A non-transitory recording medium having recorded thereon a program for performing a method for recognizing speech of a user, the method comprising: receiving a sound signal corresponding to speech of the user; acquiring personalize identification information corresponding to the speech; recognizing the speech from the sound signal using a global language model; determining reliability of the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, updating a personalized language model corresponding to the personalize identification information using the speech recognition result; determine a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extracting a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information, and determining a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determining a word most similar to the misrecognized word in a lexicon, and correct the speech recognition result by replacing the misrecognized word in the speech recognition result with the most similar word.

Assignees

Inventors

Classifications

  • Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • G10L15/07Primary

    to the speaker · CPC title

  • G10L15/183Primary

    using context dependencies, e.g. language models · CPC title

  • Feedback of the input speech · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11302311B2 cover?
An artificial intelligence apparatus for recognizing speech of a user includes a microphone, and a processor configured to receive, via the microphone, a sound signal corresponding to the speech of the user, acquire personalize identification information corresponding to the speech, recognize the speech from the sound signal using a global language model, calculate a reliability for the recogni…
Who is the assignee on this patent?
Lg Electronics Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/07. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).