Apparatus and method for compositional spoken language understanding

US12211486B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12211486-B2
Application numberUS-202217647499-A
CountryUS
Kind codeB2
Filing dateJan 10, 2022
Priority dateMay 19, 2021
Publication dateJan 28, 2025
Grant dateJan 28, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying multiple tokens contained in an input utterance; generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determining at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 2. The method of claim 1 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 3. The method of claim 1 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions for different tokens; and the machine learning model is trained to increase the first divergence. 4. The method of claim 3 , wherein: the non-degenerate objective function is based on a second divergence between the self-attention distribution for each token and its corresponding degenerate distribution; and the machine learning model is trained to increase the second divergence. 5. The method of claim 4 , wherein the first divergence and the second divergence comprise Kullback-Leibler (KL) divergences. 6. The method of claim 1 , wherein the trained machine learning model is trained to reduce correlations between slot labels. 7. An electronic device comprising: at least one processing device configured to: identify multiple tokens contained in an input utterance; generate slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determine at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 8. The electronic device of claim 7 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 9. The electronic device of claim 8 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions for different tokens; and the machine learning model is trained to increase the first divergence. 10. The electronic device of claim 9 , wherein: the non-degenerate objective function is based on a second divergence between the self-attention distribution for each token and its corresponding degenerate distribution; and the machine learning model is trained to increase the second divergence. 11. The electronic device of claim 10 , wherein the first divergence and the second divergence comprise Kullback-Leibler (KL) divergences. 12. The electronic device of claim 7 , wherein the trained machine learning model is trained to reduce correlations between slot labels. 13. A non-transitory machine-readable medium containing instructions that when executed cause at least one processor of an electronic device to: identify multiple tokens contained in an input utterance; generate slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determine at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 14. The non-transitory machine-readable medium of claim 13 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 15. The non-transitory machine-readable medium of claim 13 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions

Assignees

Inventors

Classifications

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

  • using lexical or orthographic knowledge sources · CPC title

  • Execution procedure of a spoken command · CPC title

  • Interactive procedures · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12211486B2 cover?
A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained …
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 28 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).