Linguistically rich cross-lingual text event embeddings
US-11227128-B2 · Jan 18, 2022 · US
US11507828B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11507828-B2 |
| Application number | US-201916666800-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 29, 2019 |
| Priority date | Oct 29, 2019 |
| Publication date | Nov 22, 2022 |
| Grant date | Nov 22, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Training a machine learning model such as a neural network, which can automatically extract a hypernym from unstructured data, is disclosed. A preliminary candidate list of hyponym-hypernym pairs can be parsed from the corpus. A preliminary super-term—sub-term glossary can be generated from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs. A super-term—sub-term pair can be filtered from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary. The preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary can be combined to generate a final list of hyponym-hypernym pairs. An artificial neural network can be trained using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given new text data.
Opening claim text (preview).
We claim: 1. A computer-implemented method comprising: receiving a corpus of electronic text; parsing a preliminary candidate list of hyponym-hypernym pairs from the corpus; generating a preliminary super-term—sub-term glossary from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs; filtering out a super-term—sub-term pair from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary; combining the preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary to generate a final list of hyponym-hypernym pairs; performing a transitive closure technique on at least the final list of hyponym-hypernym pairs to extract at least one additional hyponym-hypernym pair; and adding the extracted at least one additional hyponym-hypernym pair to the final list of hyponym-hypernym pairs; and training an artificial neural network using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given a new electronic text. 2. The method of claim 1 , wherein the electronic text is unstructured text data. 3. Method of claim 1 , wherein a linguistic pattern matching technique is performed to identify the preliminary candidate list of hyponym-hypernym pairs in the electronic text. 4. The method of claim 1 , wherein a linguistic pattern matching technique is performed to generate the preliminary super-term—sub-term glossary from the corpus containing one or more super-term—sub-term pairs. 5. The method of claim 1 , wherein the performing a transitive closure technique includes performing a transitive closure technique on the final super-term—sub-term glossary and the final list of hyponym-hypernym pairs to extract at least one additional hyponym-hypernym pair. 6. The method of claim 1 , further including training a sequence-to-sequence artificial neural network using the final list of hyponym-hypernym pairs as a training data set to learn a hypernym sub-term from a hyponym super-term. 7. The method of claim 6 , wherein the sequence-to-sequence artificial neural network is a long short term memory (LSTM). 8. The method of claim 6 , further including applying noun phrases extracted from the corpus to the trained sequence-to-sequence artificial neural network to infer at least one new hyponym-hypernym pair, not in the final list of hyponym-hypernym pairs. 9. The method of claim 8 , further including updating the final list of hyponym-hypernym pairs with addition of the new inferred hyponym-hypernym pair. 10. The method of claim 1 , further including running the artificial neural network in inference phase to identify a hypernym given a new electronic text. 11. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: receive a corpus of electronic text; parse a preliminary candidate list of hyponym-hypernym pairs from the corpus; generate a preliminary super-term—sub-term glossary from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs; filter out a super-term—sub-term pair from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary; combine the preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary to generate a final list of hyponym-hypernym pairs; perform a transitive closure technique on at least the final list of hyponym-hypernym pairs to extract at least one additional hyponym-hypernym pair, wherein the extracted at least one additional hyponym-hypernym pair is added to the final list of hyponym-hypernym pairs; and train an artificial neural network using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given a new electronic text. 12. The computer program product of claim 11 , wherein the electronic text is unstructured text data. 13. The computer program product of claim 11 , wherein the device caused to perform a transitive closure technique includes the device caused to perform a transitive closure technique on the final super-term—sub-term glossary and the final list of hyponym-hypernym pairs. 14. The computer program product of claim 11 , wherein the device is further caused to train a sequence-to-sequence artificial neural network using the final list of hyponym-hypernym pairs as a training data set to learn a hypernym sub-term from a hyponym super-term. 15. The computer program product of claim 14 , wherein the sequence-to-sequence artificial neural network is a long short term memory (LSTM). 16. The computer program product of claim 14 , the device is further caused to apply noun phrases extracted from the corpus to the sequence-to-sequence artificial neural network to infer at least one new hyponym-hypernym pair, not in the final list of hyponym-hypernym pairs. 17. The computer program product of claim 16 , wherein the final list of hyponym-hypernym pairs are updated with addition of the new inferred hyponym-hypernym pair. 18. The computer program product of claim 11 , wherein the device is further caused to run the artificial neural network in inference phase to identify a hypernym given a new electronic text. 19. A system comprising: a hardware processor; a memory device coupled with the hardware processor; the hardware processor configured to at least: receive a corpus of electronic text; parse a preliminary candidate list of hyponym-hypernym pairs from the corpus; generate a preliminary super-term—sub-term glossary from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs; filter out a super-term—sub-term pair from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary; combine the preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary to generate a final list of hyponym-hypernym pairs; and train an artificial neural network using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given a new electronic text, wherein the hardware processor is further configured to train a sequence-to-sequence artificial neural network using the final list of hyponym-hypernym pairs as a training data set to learn a hypernym sub-term from a hyponym super-term, and apply noun phrases extracted from the corpus to the sequence-to-sequence artificial neural network to infer at least one new hyponym-hypernym pair, not in the final list of hyponym-hypernym pairs. 20. The system of claim 19 , wherein the hardware processor is further configured to: update the final list of hyponym-hypernym pairs with addition of the new inferred hyponym-hypernym pair.
Related publications grouped by family.
Answers are generated from the same data shown on this page.