Systems and methods for natural language processing using joint energy-based models

US11934952B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11934952-B2
Application numberUS-202017124317-A
CountryUS
Kind codeB2
Filing dateDec 16, 2020
Priority dateAug 21, 2020
Publication dateMar 19, 2024
Grant dateMar 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide natural language processing (NLP) systems and methods that utilize energy-based models (EBMs) to compute an exponentially-weighted energy-like term in the loss function to train an NLP classifier. Specifically, noise contrastive estimation (NCE) procedures are applied together with the EBM-based loss objectives for training the NLPs.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a natural language processing (NLP) classifier, comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to the NLP classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 2. The method of claim 1 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 3. The method of claim 1 , wherein the first energy term is computed according to the hidden function by applying a multivariable softplus transformation to a plurality of logits of the first classification output. 4. The method of claim 1 , wherein the first energy term is computed according to the sharp-hidden function by applying a negative maximum transformation to at least a plurality of logits of the first classification output. 5. The method of claim 1 , wherein the NCE loss objective is computed by: computing a first expectation of a first weighted softplus component based on the first energy term, wherein the first expectation is taken over the data distribution; computing a second expectation of a second weighted softplus component based on the second energy term, wherein the second expectation is taken over the noise distribution; and computing a weighted sum of the first expectation and the second expectation. 6. A system for training a natural language processing (NLP) classifier, comprising: a non-transitory memory; and one or more processor coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to the NLP classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 7. The system of claim 6 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 8. The system of claim 6 , wherein the first energy term is computed according to the hidden function by applying a multivariable softplus transformation to a plurality of logits of the first classification output. 9. The system of claim 6 , wherein the first energy term is computed according to the sharp-hidden function by applying a negative maximum transformation to at least a plurality of logits of the first classification output. 10. The system of claim 6 , wherein the NCE loss objective is computed by: computing a first expectation of a first weighted softplus component based on the first energy term, wherein the first expectation is taken over the data distribution; computing a second expectation of a second weighted softplus component based on the second energy term, wherein the second expectation is taken over the noise distribution; and computing a weighted sum of the first expectation and the second expectation. 11. A non-transitory, machine-readable medium having stored thereon machine-readable instructions executable to cause a system to perform operations comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to a natural language processing (NLP) classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 12. The non-transitory, machine-readable medium of claim 11 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 13. The non-transitory, machine-readable medium of claim 11 , wherein the first energy term is

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11934952B2 cover?
Embodiments described herein provide natural language processing (NLP) systems and methods that utilize energy-based models (EBMs) to compute an exponentially-weighted energy-like term in the loss function to train an NLP classifier. Specifically, noise contrastive estimation (NCE) procedures are applied together with the EBM-based loss objectives for training the NLPs.
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).