Multi-modal spoken language understanding systems
US-11562735-B1 · Jan 24, 2023 · US
US12211486B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12211486-B2 |
| Application number | US-202217647499-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 10, 2022 |
| Priority date | May 19, 2021 |
| Publication date | Jan 28, 2025 |
| Grant date | Jan 28, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.
Opening claim text (preview).
What is claimed is: 1. A method comprising: identifying multiple tokens contained in an input utterance; generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determining at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 2. The method of claim 1 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 3. The method of claim 1 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions for different tokens; and the machine learning model is trained to increase the first divergence. 4. The method of claim 3 , wherein: the non-degenerate objective function is based on a second divergence between the self-attention distribution for each token and its corresponding degenerate distribution; and the machine learning model is trained to increase the second divergence. 5. The method of claim 4 , wherein the first divergence and the second divergence comprise Kullback-Leibler (KL) divergences. 6. The method of claim 1 , wherein the trained machine learning model is trained to reduce correlations between slot labels. 7. An electronic device comprising: at least one processing device configured to: identify multiple tokens contained in an input utterance; generate slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determine at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 8. The electronic device of claim 7 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 9. The electronic device of claim 8 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions for different tokens; and the machine learning model is trained to increase the first divergence. 10. The electronic device of claim 9 , wherein: the non-degenerate objective function is based on a second divergence between the self-attention distribution for each token and its corresponding degenerate distribution; and the machine learning model is trained to increase the second divergence. 11. The electronic device of claim 10 , wherein the first divergence and the second divergence comprise Kullback-Leibler (KL) divergences. 12. The electronic device of claim 7 , wherein the trained machine learning model is trained to reduce correlations between slot labels. 13. A non-transitory machine-readable medium containing instructions that when executed cause at least one processor of an electronic device to: identify multiple tokens contained in an input utterance; generate slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model; and determine at least one action to be performed in response to the input utterance based on at least one of the slot labels; wherein the trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself; and wherein the trained machine learning model is trained using an overall objective function that includes a slot-pair objective function and a non-degenerate objective function, the slot-pair objective function defining self-attention distributions for tokens, the non-degenerate objective function preventing the self-attention distributions for the tokens from converging to a degenerate solution based on, for each token, an average Kullback-Leibler distance between the token and its corresponding degenerate distribution. 14. The non-transitory machine-readable medium of claim 13 , wherein the trained machine learning model is further trained by: obtaining a training dataset comprising training utterances; identifying different combinations of the training utterances, each combination having two or more training utterances with a common intent and disjoint sets of slot types; concatenating the training utterances in each combination to generate at least one paired training sample for that combination; adding the paired training samples to the training dataset in order to produce an augmented training dataset; and training the machine learning model using the augmented training dataset. 15. The non-transitory machine-readable medium of claim 13 , wherein: the slot-pair objective function is based on a first divergence between different ones of the self-attention distributions
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
using lexical or orthographic knowledge sources · CPC title
Execution procedure of a spoken command · CPC title
Interactive procedures · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.