Processing communications using a prototype classifier
US-10747957-B2 · Aug 18, 2020 · US
US11302311B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11302311-B2 |
| Application number | US-201916546924-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 21, 2019 |
| Priority date | Jul 23, 2019 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An artificial intelligence apparatus for recognizing speech of a user includes a microphone, and a processor configured to receive, via the microphone, a sound signal corresponding to the speech of the user, acquire personalize identification information corresponding to the speech, recognize the speech from the sound signal using a global language model, calculate a reliability for the recognition, and if the calculated reliability exceeds a predetermined first reference value, update a personalized language model corresponding to the personalize identification information using the recognition result.
Opening claim text (preview).
What is claimed is: 1. An artificial intelligence apparatus for recognizing speech of a user, comprising: a microphone; and a processor configured to: receive, via the microphone, a sound signal corresponding to the speech of the user; acquire personalize identification information corresponding to the speech; recognize the speech from the sound signal using a global language model; determine a reliability for the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, update a personalized language model corresponding to the personalize identification information using the speech recognition result; determine a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extract a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information, and determine a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determine a word most similar to the misrecognized word in a lexicon, and correct the speech recognition results by replacing the misrecognized word in the speech recognition result with the most similar word. 2. The artificial intelligence apparatus according to claim 1 , wherein the personalized language model is a language model configured to generate a word vector from the personalize vector and phonemes recognized in an acoustic model, and wherein the processor is further configured to update the personalized language model corresponding to the personalize identification information by increasing a weight of a word included in the speech recognition result. 3. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: determine a second LM score for each word in the speech recognition result; and extract words for which the second LM score does not exceed a predetermined second reference value as the misrecognized word. 4. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: determine a third LM score with respect to the corrected recognition result; and based on the determined third LM score exceeding a predetermined third reference value, determine that the speech recognition result is successful, and update the personalized language model corresponding to the personalize identification information using the corrected recognition result. 5. The artificial intelligence apparatus according to claim 1 , wherein the processor is further configured to: based on the determined reliability exceeding the first reference value, determine an intention corresponding to the speech recognition result; and perform an operation corresponding to the determined intention. 6. The artificial intelligence apparatus according to claim 5 , wherein the processor is further configured to: project the speech recognition result in a vector space using an intention classifier; and determine the intention of the user by comparing a position of the projected speech recognition result with positions of a plurality of intention groups included in the vector space. 7. The artificial intelligence apparatus according to claim 6 , wherein the processor is further configured to determine the intention of the user as an intention corresponding to a nearest intention group nearest from the position of the projected speech recognition result among the plurality of intention groups. 8. The artificial intelligence apparatus according to claim 1 , wherein the personalize identification information includes at least one of user identification information for distinguishing each user or device identification information for distinguishing each device, the user identification information is information indicating a user identified according to voice analysis of the speech, and the device identification information is information indicating a device that has received the speech. 9. The artificial intelligence apparatus according to claim 1 , wherein the global language model and the personalized language model are a model learned using a machine learning algorithm or a deep learning algorithm, and configured as an artificial neural network. 10. A method for recognizing speech of a user, comprising: receiving a sound signal corresponding to speech of the user; acquiring personalize identification information corresponding to the speech; recognizing the speech from the sound signal using a global language model; determining reliability of the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, updating a personalized language model corresponding to the personalize identification information using the speech recognition result; determining a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extract a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information; and determine a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determine a word most similar to the misrecognized word in a lexicon, and correct the speech recognition result by replacing the misrecognized word in the speech recognition result with the most similar word. 11. A non-transitory recording medium having recorded thereon a program for performing a method for recognizing speech of a user, the method comprising: receiving a sound signal corresponding to speech of the user; acquiring personalize identification information corresponding to the speech; recognizing the speech from the sound signal using a global language model; determining reliability of the speech recognition result; based on the determined reliability exceeding a predetermined first reference value, updating a personalized language model corresponding to the personalize identification information using the speech recognition result; determine a first language model (LM) score corresponding to the speech recognition result for recognition reliability; and based on the determined reliability not exceeding the predetermined first reference value, extracting a misrecognized word lowering the first LM score from the speech recognition result and correct the misrecognized word using the personalized language model corresponding to the personalize identification information, and determining a similarity between the extracted misrecognized word and a personalize vector representing the personalize identification information, based on the determined similarity exceeding a predetermined threshold, determining a word most similar to the misrecognized word in a lexicon, and correct the speech recognition result by replacing the misrecognized word in the speech recognition result with the most similar word.
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
to the speaker · CPC title
using context dependencies, e.g. language models · CPC title
Feedback of the input speech · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.