What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training end-to-end spoken language understanding systems with unordered entities

US12046236B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12046236-B2
Application number	US-202117458772-A
Country	US
Kind code	B2
Filing date	Aug 27, 2021
Priority date	Aug 27, 2021
Publication date	Jul 23, 2024
Grant date	Jul 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reordering the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augmenting the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; pre-training a spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; and training the spoken language understanding machine learning model that is pre-trained, using the pairs of speech and meaning representation having the reordered semantic entities. 2. The method of claim 1 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 3. The method of claim 1 , wherein the alignment technique includes using time markings derived from an attention model. 4. The method of claim 3 , wherein the speech includes noisy speech data and the attention model is adapted to the noisy speech data. 5. The method of claim 1 , further including fine-tuning the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order; and the training includes training the spoken language understanding machine learning model that is fine-tuned, with the reordered semantic entities. 6. The method of claim 1 , wherein the spoken language understanding machine learning model includes a neural network. 7. The method of claim 1 , further including inputting a given speech to the trained spoken language understanding machine learning model, wherein the trained spoken language understanding machine learning model outputs a set prediction including an intent label and semantic entities associated with the given speech. 8. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to: receive pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reorder the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augment the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; and pre-train the spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; and train the spoken language understanding machine learning model that is pre-trained, using the pairs of speech and meaning representation having the reordered semantic entities. 9. The computer program product of claim 8 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 10. The computer program product of claim 8 , wherein the alignment technique includes using time markings derived from an attention model. 11. The computer program product of claim 8 , wherein the device is further caused to fine-tune the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order, wherein the device caused to train the spoken language understanding machine learning model includes the device caused to train the spoken language understanding machine learning model that is fine-tuned, with the reordered semantic entities. 12. A computer-implemented method comprising: receiving pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reordering the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augmenting the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; pre-training a spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; fine-tuning the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order; and training the spoken language understanding machine learning model that is fine-tuned, using the pairs of speech and meaning representation having the reordered semantic entities. 13. The method of claim 12 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 14. The method of claim 12 , wherein the alignment technique includes using time markings derived from an attention model. 15. The method of claim 14 , wherein the speech includes noisy speech data and the attention model is adapted to the noisy speech data. 16. The method of claim 12 , wherein the spoken language understanding machine learning model includes a neural network. 17. The method of claim 12 , further including inputting a given speech to the trained spoken language understanding machine learning model, wherein the trained spoken language understanding machine learning model outputs a set prediction including an intent label and semantic entities associated with the given speech.

Assignees

Inventors

Classifications

G10L15/16
using artificial neural networks · CPC title
G06N3/08
Learning methods · CPC title
G10L2015/088
Word spotting · CPC title
G10L15/04
Segmentation; Word boundary detection · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

View patent family 85292823

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12046236B2 cover?: Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken ord…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generation of language understanding systems and methods

Exporting dialog-driven applications to digital communication platforms

Generation of language understanding systems and methods

Location-based conversational understanding

Assignment of semantic labels to a sequence of words using neural network architectures

Frequently asked questions