End-to-end integration of dialog history for spoken language understanding

US12119008B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12119008-B2
Application numberUS-202217655441-A
CountryUS
Kind codeB2
Filing dateMar 18, 2022
Priority dateMar 18, 2022
Publication dateOct 15, 2024
Grant dateOct 15, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance and text-based content of the utterance into a uniform representation.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a memory that stores computer executable components; a processor that executes at least one of the computer executable components that: trains a hierarchical conversational neural network model to generate spoken language understandings directly of speech dialogs in an audio modality without converting the speech dialogs to a text modality, wherein the training comprises: encoding, using a text encoder of the hierarchical conversational neural network model, utterances of a training speech dialog in the audio modality, converted into the text modality, into first embeddings in a uniform embedding representation; encoding, using a speech encoder of the hierarchical conversational neural network model, the utterances of the training speech dialog in the audio modality, without being converted into the text modality, into second embeddings in the uniform embedding representation; and training, using the first embeddings of the utterances in the text modality and the second embeddings of the utterances in the audio modality, a conversation encoder of the hierarchical conversational neural network model to generate a spoken language understanding of the training speech dialog in the audio modality without converting the training speech dialog to the text modality. 2. The system of claim 1 , wherein the training of the hierarchical conversational neural network model further comprises: training, using the first embeddings of the utterances in the text modality, the speech encoder, to encode utterances of the speech dialogs in the audio modality into the second embeddings in the uniform embedding representation. 3. The system of claim 1 , wherein the training of the hierarchical conversational neural network model further comprises: using one or more cross-model loss functions to transfer semantic knowledge from the text encoder to at least one of the speech encoder or the conversation encoder. 4. The system of claim 1 , wherein the at least one of the computer executable components further: generates, using the hierarchical conversational neural network model, a spoken language understanding of at least a portion of a speech dialog in the audio modality without converting the speech dialog to the text modality. 5. The system of claim 1 , wherein the speech encoder is a student in a student-teacher joint training framework; and wherein the text encoder is a teacher in the student-teacher joint training framework. 6. The system of claim 1 , wherein the training of the hierarchical conversational neural network model further comprises dropping one or more speech frames of the training speech dialog during the training based on a hyperparameter. 7. The system of claim 3 , wherein the one or more cross-model loss functions comprise at least one of a Euclidean loss function or a Contrastive loss function. 8. A computer-implemented method comprising: training, by a system operatively coupled to a processor, a hierarchical conversational neural network model to generate spoken language understandings directly of speech dialogs in an audio modality without converting the speech dialogs to a text modality, wherein the training comprises: encoding, using a text encoder of the hierarchical conversational neural network model, utterances of a training speech dialog in the audio modality, converted into the text modality, into first embeddings in a uniform embedding representation; encoding, using a speech encoder of the hierarchical conversational neural network model, the utterances of the training speech dialog in the audio modality, without being converted into the text modality, into second embeddings in the uniform embedding representation; and training, using the first embeddings of the utterances in the text modality and the second embeddings of the utterances in the audio modality, a conversation encoder of the hierarchical conversational neural network model to generate a spoken language understanding of the training speech dialog in the audio modality without converting the training speech dialog to the text modality. 9. The computer-implemented method of claim 8 , wherein the training of the hierarchical conversational neural network model further comprises: training, using the first embeddings of the utterances in the text modality, the speech encoder, to encode utterances of the speech dialogs in the audio modality into the second embeddings in the uniform embedding representation. 10. The computer-implemented method of claim 8 , wherein the training of the hierarchical conversational neural network model further comprises: using one or more cross-model loss functions to transfer semantic knowledge from the text encoder to at least one of the speech encoder or the conversation encoder. 11. The computer-implemented method of claim 8 , further comprising: generating, by the system, using the hierarchical conversational neural network model, a spoken language understanding of at least a portion of a speech dialog in the audio modality without converting the speech dialog to the text modality. 12. The computer-implemented method of claim 8 , wherein the speech encoder is a student in a student-teacher joint training framework, and the text encoder is a teacher in the student-teacher joint training framework. 13. The computer-implemented method of claim 8 , wherein the training of the hierarchical conversational neural network model further comprises: dropping one or more speech frames of the training speech dialog during the training based on a hyperparameter. 14. The computer-implemented method of claim 10 , wherein the one or more cross-model loss functions comprise at least one of a Euclidean loss function or a Contrastive loss function. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: train a hierarchical conversational neural network model to generate spoken language understandings directly of speech dialogs in an audio modality without converting the speech dialogs to a text modality, wherein the training comprises: encoding, using a text encoder of the hierarchical conversational neural network model, utterances of a training speech dialog in the audio modality, converted into the text modality, into first embeddings in a uniform embedding representation; encoding, using a speech encoder of the hierarchical conversational neural network model, the utterances of the training speech dialog in the audio modality, without being converted into the text modality, into second embeddings in the uniform embedding representation; and training, using the first embeddings of the utterances in the text modality and the second embeddings of the utterances in the audio modality, a conversation encoder of the hierarchical conversational neural network model to generate a spoken language understanding of the training speech dialog in the audio modality without converting the training speech dialog to the text modality. 16. The computer program product of claim 15 , wherein the training of the hierarchical conversational neural network model further comprises: training, using the first embeddings of the utterances in the text modality, the speech encoder, to encode utterances of the speech dialogs in the audio modality into the second embeddings in the uniform embedding representation. 17. The computer program product of claim 15 , wherein the training of the hierarchica

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Speech recognition (G10L17/00 takes precedence) · CPC title

  • Character encoding · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12119008B2 cover?
Systems, computer-implemented methods, and computer program products to facilitate end to end integration of dialogue history for spoken language understanding are provided. According to an embodiment, a system can comprise a processor that executes components stored in memory. The computer executable components comprise a conversation component that encodes speech-based content of an utterance…
Who is the assignee on this patent?
IBM, Univ Ohio State
What technology area does this patent fall under?
Primary CPC classification G10L19/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).