Domain specific language for generation of recurrent neural network architectures
US-2018336453-A1 · Nov 22, 2018 · US
US11934952B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11934952-B2 |
| Application number | US-202017124317-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2020 |
| Priority date | Aug 21, 2020 |
| Publication date | Mar 19, 2024 |
| Grant date | Mar 19, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide natural language processing (NLP) systems and methods that utilize energy-based models (EBMs) to compute an exponentially-weighted energy-like term in the loss function to train an NLP classifier. Specifically, noise contrastive estimation (NCE) procedures are applied together with the EBM-based loss objectives for training the NLPs.
Opening claim text (preview).
What is claimed is: 1. A method for training a natural language processing (NLP) classifier, comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to the NLP classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 2. The method of claim 1 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 3. The method of claim 1 , wherein the first energy term is computed according to the hidden function by applying a multivariable softplus transformation to a plurality of logits of the first classification output. 4. The method of claim 1 , wherein the first energy term is computed according to the sharp-hidden function by applying a negative maximum transformation to at least a plurality of logits of the first classification output. 5. The method of claim 1 , wherein the NCE loss objective is computed by: computing a first expectation of a first weighted softplus component based on the first energy term, wherein the first expectation is taken over the data distribution; computing a second expectation of a second weighted softplus component based on the second energy term, wherein the second expectation is taken over the noise distribution; and computing a weighted sum of the first expectation and the second expectation. 6. A system for training a natural language processing (NLP) classifier, comprising: a non-transitory memory; and one or more processor coupled to the non-transitory memory and configured to read instructions from the non-transitory memory to cause the system to perform operations comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to the NLP classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 7. The system of claim 6 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 8. The system of claim 6 , wherein the first energy term is computed according to the hidden function by applying a multivariable softplus transformation to a plurality of logits of the first classification output. 9. The system of claim 6 , wherein the first energy term is computed according to the sharp-hidden function by applying a negative maximum transformation to at least a plurality of logits of the first classification output. 10. The system of claim 6 , wherein the NCE loss objective is computed by: computing a first expectation of a first weighted softplus component based on the first energy term, wherein the first expectation is taken over the data distribution; computing a second expectation of a second weighted softplus component based on the second energy term, wherein the second expectation is taken over the noise distribution; and computing a weighted sum of the first expectation and the second expectation. 11. A non-transitory, machine-readable medium having stored thereon machine-readable instructions executable to cause a system to perform operations comprising: receiving, via a data interface, a training dataset of data samples that correspond to a data probability distribution; generating, for at least one data sample from the training dataset, a respective noise sample according to a noise probability distribution; inputting a data sample and the respective noise sample to a natural language processing (NLP) classifier; encoding the respective data sample into an encoded data sample representation; encoding the respective noise sample into an encoded noise sample representation; generating, by the NLP classifier, a first classification output corresponding to the encoded data sample representation and a second classification output corresponding to the encoded noise sample representation; computing a first energy term based at least in part on the first classification output and the encoded data sample representation according to an energy function selected from the group consisting of a scalar function, a hidden function, and a sharp-hidden function; computing a second energy term based at least in part on the second classification output and the encoded noise sample representation according to the energy function; computing a noise contrastive estimation (NCE) loss objective based at least in part on the first energy term and the second energy term; and training the NLP classifier based at least in part on a combination of the NCE loss objective and a cross-entropy loss computed based on the first classification output conditioned on a respective data input sample. 12. The non-transitory, machine-readable medium of claim 11 , wherein the first energy term is computed according to the scalar function by a linear layer transformation of the encoded data sample representation. 13. The non-transitory, machine-readable medium of claim 11 , wherein the first energy term is
Related publications grouped by family.
Answers are generated from the same data shown on this page.