What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 14 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Device for recognizing speech input of user and operating method thereof

US11984126B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11984126-B2
Application number	US-202117439247-A
Country	US
Kind code	B2
Filing date	Aug 10, 2021
Priority date	Aug 12, 2020
Publication date	May 14, 2024
Grant date	May 14, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device for recognizing a speech input and an operating method thereof are provided. The device may be configured to: obtain one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing a speech input using an automatic speech recognition (ASR) model; extract text history information corresponding to the speech input from a database by comparing the speech input with a plurality of speech signals previously stored in the database; and perform training to adjust a weight of each of the one or more text candidates using the extracted text history information. Also, a method in which the device recognizes a speech input using an AI model may be performed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method in which a device recognizes a speech input, the method comprising: receiving the speech input; obtaining one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing the speech input using an automatic speech recognition (ASR) model; calculating a similarity between the speech input and each of a plurality of speech signals previously stored in a database; identifying a speech signal for which the calculated similarity exceeds a predetermined threshold, from among the plurality of speech signals; extracting, from the database, text history information corresponding to the identified speech signal from among a plurality of text history information previously stored in the database; and performing training to adjust a weight of each of the one or more text candidates using the extracted text history information. 2. The method of claim 1 , wherein the plurality of text history information is obtained by converting the plurality of speech signals using the ASR model, and wherein the plurality of speech signals are respectively paired with the plurality of corresponding text history information and stored in the database. 3. The method of claim 1 , further comprising determining, through the training, a text output by converting the speech input. 4. The method of claim 3 , further comprising: obtaining at least one text interpretation result candidate by interpreting the output text using a natural language understanding (NLU) model; extracting text interpretation history information corresponding to the text from among a plurality of text interpretation result history information, by comparing the obtained at least one text interpretation result candidate with the plurality of text interpretation result history information previously stored in the database; training a weight for obtaining text interpretation result information from the text through the NLU model, using the extracted text interpretation history information; and updating the NLU model using a result of the training. 5. The method of claim 4 , wherein the database stores a plurality of texts previously obtained and a plurality of text interpretation history information obtained by interpreting the plurality of texts using the NLU model, and wherein the plurality of texts are respectively paired with the plurality of corresponding text interpretation history information and stored in the database. 6. The method of claim 5 , wherein the extracting of the text interpretation history information corresponding to the text comprises: calculating a similarity between the text and each of the plurality of texts previously stored in the database; identifying a text for which the calculated similarity exceeds a specified threshold, from among the plurality of texts; and extracting the text interpretation history information paired with the identified text from among the plurality of text interpretation history information. 7. A device configured to recognize a speech input, the device comprising: a speech input interface comprising circuitry configured to receive the speech input; a database configured for storing speech recognition result history information comprising a plurality of speech signals received prior to the speech input being received and a plurality of text history information respectively corresponding to the plurality of speech signals; a memory storing a program comprising one or more instructions; and at least one processor configured to execute the one or more instructions of the program stored in the memory, and individually and/or collectively configured to: receive the speech input from the speech input interface; obtain one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing the speech input using an automatic speech recognition (ASR) model; calculate a similarity between the speech input and each of the plurality of speech signals stored in the database, identify a speech signal for which the calculated similarity exceeds a predetermined threshold, from among the plurality of speech signals, extract, from the database, text history information corresponding to the identified speech signal from among the plurality of text history information stored in the database; and perform training to adjust a weight of each of the one or more text candidates using the extracted text history information. 8. The device of claim 7 , wherein the plurality of text history information is to be obtained by converting the plurality of speech signals using the ASR model, and wherein the plurality of speech signals are respectively paired with the plurality of corresponding text history information and stored in the database. 9. The device of claim 7 , wherein the at least one processor is configured, individually and/or collectively, to determine, through the training, a text output by converting the speech input. 10. The device of claim 9 , wherein the at least one processor is configured, individually and/or collectively, to: obtain at least one text interpretation result candidate by interpreting the output text using a natural language understanding (NLU) model; extract text interpretation history information corresponding to the text from among a plurality of text interpretation result history information by comparing the obtained at least one text interpretation result candidate with the plurality of text interpretation result history information previously stored in the database; train a weight for obtaining text interpretation result information from the text through the NLU model using the extracted text interpretation history information; and update the NLU model using a result of the training. 11. The device of claim 10 , wherein the database stores a plurality of texts previously obtained and a plurality of text interpretation history information obtained by interpreting the plurality of texts using the NLU model, and wherein the plurality of texts are respectively paired with the plurality of corresponding text interpretation history information and stored in the database. 12. The device of claim 11 , wherein the at least one processor is configured, individually and/or collectively, to: calculate a similarity between the text and each of the plurality of texts previously stored in the database; identify a text for which the calculated similarity exceeds a specified threshold, from among the plurality of texts; and extract the text interpretation history information paired with the identified text from among the plurality of text interpretation history information. 13. A computer program product comprising a non-transitory computer-readable storage medium, having stored thereon instructions configured to, which, when executed, cause the electronic device to perform operations comprising: receiving a speech input; obtaining one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing the speech input using an automatic speech recognition (ASR) model; calculating a similarity between the speech input and each of a plurality of speech signals previously stored in a database; identifying a speech signal for which the calculated similarity exceeds a predetermined threshold, from among the plurality of speech signals; extracting, from the database, text history information corresponding to the identified speech signal from among a plurality of text history information previous

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06F40/279Primary
Recognition of textual entities · CPC title
G10L15/063
Training · CPC title
G10L25/51
for comparison or discrimination · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

View patent family 80247457

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11984126B2 cover?: A device for recognizing a speech input and an operating method thereof are provided. The device may be configured to: obtain one or more text candidates comprising a character string in which it is predicted that the speech input is to be converted by recognizing a speech input using an automatic speech recognition (ASR) model; extract text history information corresponding to the speech input…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 14 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).