Training end-to-end spoken language understanding systems with unordered entities

US12046236B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12046236-B2
Application numberUS-202117458772-A
CountryUS
Kind codeB2
Filing dateAug 27, 2021
Priority dateAug 27, 2021
Publication dateJul 23, 2024
Grant dateJul 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken order of the associated speech using an alignment technique. A spoken language understanding machine learning model can be trained using the pairs of speech and meaning representation having the reordered semantic entities. The meaning representation, e.g., semantic entities, in the received training data can be perturbed to create random order sequence variations of the semantic entities associated with speech. Perturbed meaning representation with associated speech can augment the training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reordering the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augmenting the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; pre-training a spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; and training the spoken language understanding machine learning model that is pre-trained, using the pairs of speech and meaning representation having the reordered semantic entities. 2. The method of claim 1 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 3. The method of claim 1 , wherein the alignment technique includes using time markings derived from an attention model. 4. The method of claim 3 , wherein the speech includes noisy speech data and the attention model is adapted to the noisy speech data. 5. The method of claim 1 , further including fine-tuning the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order; and the training includes training the spoken language understanding machine learning model that is fine-tuned, with the reordered semantic entities. 6. The method of claim 1 , wherein the spoken language understanding machine learning model includes a neural network. 7. The method of claim 1 , further including inputting a given speech to the trained spoken language understanding machine learning model, wherein the trained spoken language understanding machine learning model outputs a set prediction including an intent label and semantic entities associated with the given speech. 8. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to: receive pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reorder the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augment the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; and pre-train the spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; and train the spoken language understanding machine learning model that is pre-trained, using the pairs of speech and meaning representation having the reordered semantic entities. 9. The computer program product of claim 8 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 10. The computer program product of claim 8 , wherein the alignment technique includes using time markings derived from an attention model. 11. The computer program product of claim 8 , wherein the device is further caused to fine-tune the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order, wherein the device caused to train the spoken language understanding machine learning model includes the device caused to train the spoken language understanding machine learning model that is fine-tuned, with the reordered semantic entities. 12. A computer-implemented method comprising: receiving pairs of speech and meaning representation associated with the speech, the meaning representation including at least semantic entities associated with the speech, wherein spoken order of the semantic entities is unknown; reordering the semantic entities into spoken order of words associated with the semantic entities in the speech using an alignment technique; augmenting the received pairs of speech and meaning representation to include random order sequence variations of the semantic entities; pre-training a spoken language understanding machine learning model using the augmented pairs of speech and meaning representation; fine-tuning the spoken language understanding machine learning model that is pre-trained, using the semantic entities in alphabetical order; and training the spoken language understanding machine learning model that is fine-tuned, using the pairs of speech and meaning representation having the reordered semantic entities. 13. The method of claim 12 , wherein the alignment technique includes acoustic keyword spotting used with a hybrid speech recognition model. 14. The method of claim 12 , wherein the alignment technique includes using time markings derived from an attention model. 15. The method of claim 14 , wherein the speech includes noisy speech data and the attention model is adapted to the noisy speech data. 16. The method of claim 12 , wherein the spoken language understanding machine learning model includes a neural network. 17. The method of claim 12 , further including inputting a given speech to the trained spoken language understanding machine learning model, wherein the trained spoken language understanding machine learning model outputs a set prediction including an intent label and semantic entities associated with the given speech.

Assignees

Inventors

Classifications

  • using artificial neural networks · CPC title

  • Learning methods · CPC title

  • Word spotting · CPC title

  • Segmentation; Word boundary detection · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12046236B2 cover?
Training data can be received, which can include pairs of speech and meaning representation associated with the speech as ground truth data. The meaning representation includes at least semantic entities associated with the speech, where the spoken order of the semantic entities is unknown. The semantic entities of the meaning representation in the training data can be reordered into spoken ord…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).