Deep neural network model for processing data through multiple linguistic task hierarchies

US11222253B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11222253-B2
Application numberUS-201715421424-A
CountryUS
Kind codeB2
Filing dateJan 31, 2017
Priority dateNov 3, 2016
Publication dateJan 11, 2022
Grant dateJan 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed provides a so-called “joint many-task neural network model” to solve a variety of increasingly complex natural language processing (NLP) tasks using growing depth of layers in a single end-to-end model. The model is successively trained by considering linguistic hierarchies, directly connecting word representations to all model layers, explicitly using predictions in lower tasks, and applying a so-called “successive regularization” technique to prevent catastrophic forgetting. Three examples of lower level model layers are part-of-speech (POS) tagging layer, chunking layer, and dependency parsing layer. Two examples of higher level model layers are semantic relatedness layer and textual entailment layer. The model achieves the state-of-the-art results on chunking, dependency parsing, semantic relatedness and textual entailment.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network that processes words in an input sentence, the neural network comprising a processor that processes: a part-of-speech (FOS) label embedding layer that produces, using the processor, POS label embeddings from words embeddings generated from the words in the input sentence; a chunk label embedding layer overlaying the POS label embedding layer, the chunk label embedding layer receives, using a first bypass connection the POS label embeddings from the POS label embedding layer, and using a second bypass connection the word embeddings, and produces, using the processor, chunk label embeddings and chunk state vectors from the POS label embeddings and the words embeddings; a dependency parsing layer overlaying the chunk label embedding layer, the dependency parsing layer comprising: a bi-directional long-short term memory (LSTM) that receives, using the first bypass connection the POS label embeddings; using the second bypass connection the word embeddings, and using a third bypass connection the chunk label embeddings and the chunk state vectors from the chunk label embedding layer, and processes, using the processor, the word embeddings, the POS label embeddings, the chunk label embeddings and the chunk state vectors, to produce parent label state vectors; an attention encoder that, using the processor: produces parent label probability mass vectors from the parent label state vectors; and produces parent label embedding vectors from the parent label probability mass vectors; and a dependency relationship label classifier that: exponentially normalizes the parent label state vectors and the parent label embedding vectors to produce dependency relationship label probability mass vectors; and produces dependency relationship label embedding vectors from the dependency relationship label probability mass vectors; and an output, using the processor, that outputs the dependency relationship label embedding vectors. 2. The neural network of claim 1 : wherein the parent label state vectors produced by the bi-directional LSTM are forward and backward parent label state vectors for each respective word in the input sentence, which represent forward and backward progressions of interactions among the words in the input sentence from which the parent label probability mass vectors are produced; and wherein the attention encoder processes the forward and backward parent label state vectors for each respective word in the input sentence, encodes attention as vectors of inner products between each respective word and other words in the input sentence, with a linear transform applied to the forward and backward parent label state vectors for the word or the other words, and produces the parent label embedding vectors from the encoded attention vectors. 3. The neural network of claim 2 , wherein the linear transform is trainable during training of the dependency relationship label classifier. 4. The neural network of claim 2 , wherein a number of available analytical framework labels, over which the parent label probability mass vectors are calculated, is one-fifth or less a dimensionality of the forward and backward parent label state vectors, thereby forming a dimensionality bottleneck that reduces overfitting when training a neural network stack of bi-directional LSTMs. 5. A neural network system that processes words in an input sentence, the neural network system comprising: at least one memory configured to store a dependency parsing layer, a chunk label embedding layer, and a POS label embedding layer; the dependency parsing layer that overlies the chunk label embedding layer that produces chunk label embeddings and chunk state vectors from part-of-speech (POS) label embeddings and word embeddings of the words in the input sentence, the POS label embeddings received from the POS label embedding layer using a first bypass connection and the word embeddings received using a second bypass connection; the chunk label embedding layer, in turn, overlies the POS label embedding layer, the POS label embedding layer that produces the POS label embeddings from the word embeddings; the dependency parsing layer including a dependency parent layer and a dependency relationship label classifier, wherein the dependency parent layer includes: a dependency parent analyzer, implemented as a bi-directional long-short term memory (LSTM), that: receives, using the second bypass connection the word embeddings, using the first bypass connection the POS label embeddings from the POS label embedding layer, and using a third bypass connection the chunk label embeddings and the chunk state vector from the chunk label embedding layer; and processes the words in the input sentences, including processing, for each word, the word embeddings, the POS label embeddings, the chunk label embeddings, and the chunk state vector to accumulate forward and backward state vectors that represent forward and backward progressions of interactions among the words in the input sentence; and an attention encoder that: processes the forward and backward state vectors for each respective word in the input sentence, and encodes attention as inner products between each respective word and other words in the input sentence, with a linear transform applied to the forward and backward state vectors for the word or the other words prior to the inner products; applies exponential normalization to vectors of the inner products to produce parent label probability mass vectors and projects the parent label probability mass vectors to produce parent label embedding vectors; and wherein the dependency relationship label classifier, for each respective word in the input sentence: processes the forward and backward state vectors and the parent label embedding vectors, to produce dependency relationship label probability mass vectors; and projects the dependency relationship label probability mass vectors to produce dependency relationship label embedding vectors; and an output processor that outputs at least the dependency relationship label probability mass vectors, or the dependency relationship label embedding vectors. 6. The neural network system of claim 5 , wherein the linear transform applied prior to the inner products is trainable during training of the dependency parent layer and the dependency relationship label classifier. 7. The neural network system of claim 5 , wherein a number of available analytical framework labels, over which the dependency relationship label probability mass vectors are calculated, is one-fifth or less a dimensionality of the forward and backward state vectors, thereby forming a dimensionality bottleneck that reduces overfitting when training a neural network stack of the bi-directional LSTMs. 8. A method for parsing words in an input sentence using a neural network device, the method comprising: producing, at a part-of-speed (POS) label embedding layer, POS label embeddings from word embeddings of the words in the input sentence; producing, at a chunk label embedding layer that overlies the POS label embedding layer, chunk label embeddings and chunk state vectors from the POS label embeddings received from the POS embedding layer using a first bypass connection and the word embeddings received using a second bypass connection; receiving, at a dependency parsing layer that overlies a chunk label embedding layer, the POS label embeddings using the first bypass connection, the word embeddings using the second bypass connection, and the chunk label embeddings and the chunk state vectors from the chunk label embedding layer using a third bypass connection, the dependency parsing layer including a dependency parent layer and a depend

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Learning methods · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11222253B2 cover?
The technology disclosed provides a so-called “joint many-task neural network model” to solve a variety of increasingly complex natural language processing (NLP) tasks using growing depth of layers in a single end-to-end model. The model is successively trained by considering linguistic hierarchies, directly connecting word representations to all model layers, explicitly using predictions in lo…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).