Training a Joint Many-Task Neural Network Model using Successive Regularization
US-2018121799-A1 · May 3, 2018 · US
US11816581B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11816581-B2 |
| Application number | US-202017014435-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2020 |
| Priority date | Sep 8, 2020 |
| Publication date | Nov 14, 2023 |
| Grant date | Nov 14, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A fast neural transition-based parser. The fast neural transition-based parser includes a decision tree-based classifier and a state vector control loss function. The decision tree-based classifier is dynamically used to replace a multilayer perceptron in the fast neural transition-based parser, and the decision tree-based classifier increases speed of neural transition-based parsing. The state vector control loss function trains the fast neural transition-based parser, the state vector control loss function builds a vector space favorable for building a decision tree that is used for the decision tree-based classifier in the neural transition-based parser, and the state vector control loss function maintains accuracy of neural transition-based parsing while the decision tree-based classifier is used to increase the speed of the neural transition-based parsing while using the decision tree-based classifier to increase the speed of the neural transition-based parsing.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for using a neural transition-based parser to parse a sentence, the method comprising: training, by a server, the neural transition-based parser by clustering state vectors, distributing centroids of the state vectors, gathering the state vectors in a same action class into a hyperrectangle, and determining, based on a set of action classes, an optimized set of trainable parameters of the neural transition-based parser; receiving, by the server, a vector representation of a state of parsing the sentence, the vector representation being in a vector space built by a state vector control loss function in training the neural transition-based parser; predicting, by the server, by using a decision tree-based classifier in the neural transition-based parser, a parsing action based on the vector representation; calculating, by the server, by using the decision tree-based classifier, a Gini coefficient and a number of samples, based on the vector representation; determining, by the server, whether either of two conditions is met, the two conditions being that the Gini coefficient is greater than a predetermined threshold of the Gini coefficient and the number of samples is less than a predetermined threshold of the number of samples; and in response to determining that neither of the two conditions is met, applying, by the server, the parsing action predicted by the decision tree-based classifier to the state of parsing the sentence by using the neural transition-based parser. 2. The computer-implemented method of claim 1 , further comprising: in response to determining that either of the two conditions is met, using, by the server, a multilayer perceptron in the neural transition-based parser to predict the parsing action based on the vector representation; and applying, by the server, the parsing action predicted by the multilayer perceptron to the state of parsing the sentence by using the neural transition-based parser. 3. The computer-implemented method of claim 1 , wherein the vector space is built by the state vector control loss function such that the state vectors in the same action class are clustered and the centroids of the state vectors are distributed in different action classes. 4. The computer-implemented method of claim 1 , wherein the vector space is built by the state vector control loss function such that the vector space is for building a decision tree that is used for the decision tree-based classifier and the state vectors in the same action class are gathered into the hyperrectangle by using an L p -norm and adjusting p. 5. The computer-implemented method of claim 1 , wherein, with each of given sets of trainable parameters of neural networks in the neural transition-based parser, training the neural transition-based parser comprises: calculating, by the server, a centroid vector for an action class by averaging the state vectors in the action class; calculating, by the server, an intra-class distance loss for the action class by calculating an averaged L p -norm of distances between the centroid vector and each of the state vectors in the action class; calculating, by the server, intra-class distance losses for respective action classes and a sum of the intra-class distance losses; calculating, by the server, an inter-class distance loss between a pair of action classes by considering an L p -norm of a difference between centroid vectors of the pair of action classes; calculating, by the server, inter-class distance losses for respective pairs of action classes and a sum of the inter-class distance losses; calculating, by the server, an additional loss, which includes the sum of the intra-class distance losses and the sum of the inter-class distance losses; and calculating, by the server, a training loss of the neural transition-based parser, which includes the additional loss and a standard cross-entropy loss, wherein the standard cross-entropy loss is computed from action probabilities. 6. The computer-implemented method of claim 5 , further comprising: determining, by the server, the optimized set of trainable parameters by minimizing the training loss. 7. A computer program product for using a neural transition-based parser to parse a sentence, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by one or more processors, the program instructions executable to: train, by a server, the neural transition-based parser by clustering state vectors, distributing centroids of the state vectors, gathering the state vectors in a same action class into a hyperrectangle, and determining, based on a set of action classes, an optimized set of trainable parameters of the neural transition-based parser; receive, by the server, a vector representation of a state of parsing the sentence, the vector representation being in a vector space built by a state vector control loss function in training the neural transition-based parser; predict, by the server, by using a decision tree-based classifier in the neural transition-based parser, a parsing action based on the vector representation; calculate, by the server, by using the decision tree-based classifier, a Gini coefficient and a number of samples, based on the vector representation; determine, by the server, whether either of two conditions is met, the two conditions being that the Gini coefficient is greater than a predetermined threshold of the Gini coefficient and the number of samples is less than a predetermined threshold of the number of samples; and in response to determining that neither of the two conditions is met, apply, by the server, the parsing action predicted by the decision tree-based classifier to the state of parsing the sentence by using the neural transition-based parser. 8. The computer program product of claim 7 , further comprising the program instructions executable to: in response to determining that either of the two conditions is met, use, by the server, a multilayer perceptron in the neural transition-based parser to predict the parsing action based on the vector representation; and apply, by the server, the parsing action predicted by the multilayer perceptron to the state of parsing the sentence by using the neural transition-based parser. 9. The computer program product of claim 7 , wherein the vector space is built by the state vector control loss function such that state vectors in the same action class are clustered and the centroids of the state vectors are distributed in different action classes. 10. The computer program product of claim 7 , wherein the vector space is built by the state vector control loss function such that the vector space is for building a decision tree that is used for the decision tree-based classifier and the state vectors in the same action class are gathered into the hyperrectangle by using an L p -norm and adjusting p. 11. The computer program product of claim 7 , for training the neural transition-based parser with each of given sets of trainable parameters of neural networks in the neural transition-based parser, further comprising the program instructions executable to: calculate, by the server, a centroid vector for an action class by averaging the state vectors in the action class; calculate, by the server, an intra-class distance loss for the action class by calculating an averaged L p -norm of distances between the centroid vector and each of the state vectors in the action class; calculate, by the server, intra-class distance losses for respective action classes and a sum of the intra-class d
Feedforward networks · CPC title
Supervised learning · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
Architecture, e.g. interconnection topology · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.