System and method for information extraction with character level features
US-11055527-B2 · Jul 6, 2021 · US
US11748613B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748613-B2 |
| Application number | US-201916409148-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 10, 2019 |
| Priority date | May 10, 2019 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are embodiments for a deep level-wise extreme multi-label learning and classification (XMLC) framework to facilitate the semantic indexing of literatures. In one or more embodiments, the Deep Level-wise XMLC framework comprises two sequential modules, a deep level-wise multi-label learning module and a hierarchical pointer generation module. In one or more embodiments, the first module decomposes terms of domain ontology into multiple levels and builds a special convolutional neural network for each level with category-dependent dynamic max-pooling and macro F-measure based weights tuning. In one or more embodiments, the second module merges the level-wise outputs into a final summarized semantic indexing. The effectiveness of Deep Level-wise XMLC framework embodiments is demonstrated by comparing it with several state-of-the-art methods of automatic labeling on various datasets.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for multi-label learning and classification using one or more processors to cause steps to be performed comprising: processing raw training texts into cleaned training texts; parsing training labels into level-wise labels at multiple levels based on their ontological hierarchies; training a set of two or more level-wise models of a level-wise multi-label classification model based on at least the level-wise labels and the cleaned texts, with each level-wise model related to a corresponding level of labels; obtaining, using the trained set of two or more level-wise models, level-wise predictions from one or more inputs; and using the level-wise predictions as inputs into a point generation model to train the point generation model to generate a reduced set of the level-wise predictions comprising a set of relevant labels. 2. The computer-implemented method of claim 1 wherein the one or more inputs comprise word embeddings for documents, word embeddings for keywords, upper-level embedding, and lower-level embedding. 3. The computer-implemented method of claim 2 wherein obtaining level-wise predictions comprises: receiving, at convolutional neural networks (CNNs) within each level-wise model, inputs of word embeddings for documents, word embeddings for keywords, upper-level label embedding, and lower-level label embedding, for feature representations extraction from each input; obtaining concatenated embeddings using the extracted feature representations from each input; performing, at a max-pooling layer, a dynamic max-pooling to select desired features from the concatenated embeddings; obtaining a compact representation from the desired features by applying batch normalization and one or more fully connected layers; and employing a binary cross-entropy loss over an output layer and a hidden bottleneck layer based on at least the obtained compact representation to train each level-wise model. 4. The computer-implemented method of claim 3 wherein a bi-directional Long Short- Term Memory (Bi-LSTM) is constructed over the feature representations extracted from the word embeddings for documents to keep language order before concatenation. 5. The computer-implemented method of claim 3 wherein in performing dynamic max-pooling, level-wise related information of labels is incorporated into neural structures of at least the max-pooling layer to capture both label co-occurrences and categorical relations among labels for dynamically selection of max-pooling dimension. 6. The computer-implemented method of claim 1 wherein the step of obtaining, using the trained set of level-wise models, level-wise predictions from one or more inputs further comprises using one or more refining strategies, in which the one or more refining strategies comprises a macro F-measure optimization to enable each level-wise model to refine level-wise predictions in an incremental manner through threshold tuning. 7. The computer-implemented method of claim 1 wherein using the level-wise predictions as inputs into a point generation model to train the point generation model to generate a reduced set of the level-wise predictions comprising a set of relevant labels comprises: encoding, using an encoder within the point generation model, the level-wise predictions to multiple sequences of encoder hidden states corresponding to the multiple levels respectively; deriving a plurality of attention generators from the multiple sequences encoder hidden state to generate an attention distribution and a context representation for each of the multiple levels; obtaining a generation probability from the context representation, predicted label sequence representations, and decoder input to generate multiple sequences of decoder hidden states; and generating an output of final summarized semantic indexing labels based on at least the decoder hidden states. 8. The computer-implemented method of claim 7 wherein a coverage mechanism is combined with the point generation model to remove repetitive terms in each level and across levels. 9. A system of multi-label learning and classification for large scale semantic indexing, the system comprising: a level-wise multi-label classification model decomposing labels in a high dimensional space into level-wise labels in multiple levels based on ontological hierarchies of the labels, the level-wise multi-label classification model comprises multiple neural network (NN) models, with a NN model for each level, each NN model extracts feature representations from inputs of word embeddings for documents, word embeddings for keywords, an upper-level label embedding, and a lower-level label embedding, each NN model comprises: a max-pooling layer for dynamic max-pooling to select features from concatenated embeddings concatenated from feature representations extracted from inputs; one or more normalization layers and one or more fully connected layers for batch normalization and obtaining a compact representation from the selected features; and an output layer outputting level-wise predictions for the level; and a point generation model that receives the level-wise predictions for each of the multiple levels as inputs and generates a unified label set for the documents, the point generation model comprises: an encoder to encode the level-wise predictions to multiple sequences of encoder hidden states corresponding to the multiple levels; a plurality of attention generators derived from the multiple sequences encoder hidden state to generate an attention distribution and a context representation for each of the multiple levels; and a decoder to generate multiple sequences of decoder hidden states based on at least the generated context representation for each of the multiple levels, the decoder generates the unified label set using at least the decoder hidden states. 10. The system of claim 9 wherein a bi-directional Long Short-Term Memory (Bi-LSTM) is constructed over the feature representations extracted from the word embeddings for documents to keep language order before concatenation. 11. The system of claim 9 wherein in performing dynamic max-pooling, level-wise related information of labels is incorporated into neural structures of the max-pooling layer to dynamically select max-pooling dimension. 12. The system of claim 9 wherein the level-wise multi-label classification model uses an online F-measure optimization (OFO) to enable each NN model to refine level-wise predictions in an incremental manner through tuning a threshold for the OFO. 13. The system of claim 12 wherein the threshold is updated according to an inter-iteration rule within a same iteration and a cross-iteration rule between iterations. 14. The system of claim 9 wherein the point generation model incorporates a coverage mechanism to remove repetitive labs in each level and across levels. 15. The system of claim 9 wherein each NN model further comprises a bottleneck layer with an activation function, the NN model is pre-trained by employing a binary cross-entropy loss over the output layer and the bottleneck layer. 16. The system of claim 15 wherein the binary cross-entropy loss is function involving weight matrices associated with the bottleneck layer and output layer. 17. A computer-implemented method for multi-label learning and classification for one or more documents using one or more processors to cause steps to be performed comprising: applying a first module comprising a set of two or more level-wise neural network (NN) models, in wh
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.