Computational modeling and classification of data streams
US-2018197089-A1 · Jul 12, 2018 · US
US2020125648A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2020125648-A1 |
| Application number | US-201816168129-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 23, 2018 |
| Priority date | Oct 23, 2018 |
| Publication date | Apr 23, 2020 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for using machine learning to determine electronic document similarity include extracting entities and corresponding relationships from each of two electronic documents of a corpus of electronic documents based on word embedding, computing an entity distance between the extracted entities and a relationship distance between the extracted relationships based on knowledge graph embedding, combining the entity and relationship distances to generate a similarity score between the electronic documents, and implementing the similarity score to perform a task associated with the electronic documents.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for using machine learning to determine electronic document similarity, comprising: extracting entities and corresponding relationships from each of two electronic documents of a corpus of electronic documents based on word embedding; computing an entity distance between the extracted entities and a relationship distance between the extracted relationships based on knowledge graph embedding; combining the entity and relationship distances to generate a similarity score between the electronic documents; and implementing the similarity score to perform a task associated with the electronic documents. 2 . The method of claim 1 , further comprising training the word embedding and the knowledge graph embedding. 3 . The method of claim 1 , wherein extracting the entities and relationships further includes extracting the entities and relationships using a rule-based information extraction method. 4 . The method of claim 1 , wherein extracting the entities and relationships further includes extracting the entities and relationships using a deep learning method. 5 . The method of claim 1 , wherein the entity and relationship distances are computed based on Earth Mover's Distance. 6 . The method of claim 1 , wherein combining the entity and relationship distances to generate the similarity score further includes combining the entity and relationship distances to generate the similarity score as a weighted sum. 7 . The method of claim 1 , wherein implementing the similarity score further includes performing at least one action selected from the group consisting of: electronic document clustering to classify different types of electronic documents for quick review; a search for electronic documents based on the similarity score in response to receiving a search query; electronic document de-duplication based on the similarity score; and an electronic document answer provision based on the similarity score in response to receiving a question query associated with a question-answering system. 8 . A system for using machine learning to determine electronic document similarity, comprising: a memory device for storing program code; and at least one processor device operatively coupled to the memory device and configured to execute program code stored on the memory device to: extract entities and corresponding relationships from each of two electronic documents of a corpus of electronic documents based on word embedding; compute an entity distance between the extracted entities and a relationship distance between the extracted relationships based on knowledge graph embedding; combine the entity and relationship distances to generate a similarity score between the electronic documents; and implement the similarity score to perform a task associated with the electronic documents. 9 . The system of claim 8 , wherein the at least one processor is further configured to execute program code stored on the memory device to train the word embedding and the knowledge graph embedding. 10 . The system of claim 8 , wherein the at least one processor is further configured to extract the entities and relationships further by extracting the entities and relationships using at least one of a rule-based information extraction method and a deep learning method. 11 . The system of claim 8 , wherein the entity and relationship distances are computed based on Earth Mover's Distance. 12 . The system of claim 8 , wherein the at least one processor is further configured to combine the entity and relationship distances to generate the similarity score by combining the entity and relationship distances to generate the similarity score as a weighted sum. 13 . The system of claim 8 , wherein the at least one processor is further configured to implement the similarity score by performing at least one action selected from the group consisting of: electronic document clustering to classify different types of electronic documents for quick review; a search for electronic documents based on the similarity score in response to receiving a search query; electronic document de-duplication based on the similarity score; and an electronic document answer provision based on the similarity score in response to receipt of a question query associated with a question-answering system. 14 . A computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code executable by a computer to cause the computer to perform a method for using machine learning to determine electronic document similarity, the method performed by the computer comprising: extracting entities and corresponding relationships from each of two electronic documents of a corpus of electronic documents based on word embedding; computing an entity distance between the extracted entities and a relationship distance between the extracted relationships based on knowledge graph embedding; combining the entity and relationship distances to generate a similarity score between the electronic documents; and implementing the similarity score to perform a task associated with the electronic documents. 15 . The computer program product of claim 14 , wherein the method further comprises training the word embedding and the knowledge graph embedding. 16 . The computer program product of claim 14 , wherein extracting the entities and relationships further includes extracting the entities and relationships using a rule-based information extraction method. 17 . The computer program product of claim 14 , wherein extracting the entities and relationships further includes extracting the entities and relationships using a deep learning method. 18 . The computer program product of claim 14 , wherein the entity and relationship distances are computed based on Earth Mover's Distance. 19 . The computer program product of claim 14 , wherein combining the entity and relationship distances to generate the similarity score further includes combining the entity and relationship distances to generate the similarity score as a weighted sum. 20 . The computer program product of claim 14 , wherein implementing the similarity score further includes performing at least one action selected from the group consisting of: electronic document clustering to classify different types of electronic documents for quick review; a search for electronic documents based on the similarity score in response to receiving a search query; electronic document de-duplication based on the similarity score; and an electronic document answer provision based on the similarity score in response to receiving a question query associated with a question-answering system.
using ranking · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Machine learning · CPC title
Clustering or classification · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.