Filtering spurious knowledge graph relationships between labeled entities

US11080491B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11080491-B2
Application numberUS-201916600774-A
CountryUS
Kind codeB2
Filing dateOct 14, 2019
Priority dateOct 14, 2019
Publication dateAug 3, 2021
Grant dateAug 3, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and techniques that facilitate spurious relationship filtration from external knowledge graphs based on distributional semantics of an input corpus are provided. In one or more embodiments, a context component can generate a context-based word embedding of one or more first terms in a document collection. The embedding can yield vector representations of the one or more first terms. The one or more first terms can correspond to knowledge terms in one or more first nodes of a knowledge graph. In one or more embodiments, a filtering component can filter out a relationship between the one or more first nodes and a second node of the knowledge graph based on a similarity value being less than a threshold. The similarity value can be a function of the vector representations of the one or more first terms. In various embodiments, cosine similarity can be used to compute the similarity value.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory that stores computer-executable components; and a processor, operably coupled to the memory, that executes the computer-executable components stored in the memory, wherein the computer-executable components comprise: a context component that: generates a context-based embedding of one or more first terms in a document collection, thereby yielding vector representations of the one or more first terms, wherein the one or more first terms correspond to knowledge terms in one or more first nodes of a knowledge graph; and determines that at least two of the one or more first terms that have vector representations that have a similarity value that meets a defined criterion have a hypernymy relation between them, and determines that at least two of the one or more first terms that have vector representations that fail to have a similarity value that meets the defined criterion do not have a hypernymy relation between them, wherein the vector representations are compared to determine whether the hypernymy relation from the knowledge graph for the at least two of the one or more first terms is spurious; and a filtering component that filters out a relationship between the one or more first nodes and a second node of the knowledge graph based on the similarity value. 2. The system of claim 1 , wherein: the similarity value is based on a cosine similarity between the vector representations of the one or more first terms and a vector representation of a second term in the document collection corresponding to the second node. 3. The system of claim 1 , wherein: the similarity value is based on average pairwise cosine similarities between the vector representations of the one or more first terms. 4. The system of claim 1 , wherein: the similarity value is based on cosine similarities between the vector representations of the one or more first terms and a vector representation of a prototypical term in the one or more first terms. 5. The system of claim 1 , wherein the knowledge terms are part of one or more labels, descriptions, definitions, or other text associated with the one or more first nodes. 6. The system of claim 1 , wherein the relationship is one from the group consisting of a hypernym-hyponym relation, a synonymy relation, an antonymy relation, an entailment relation, and a partonomy relation. 7. The system of claim 1 , wherein the one or more first terms correspond to the knowledge terms lexically, orthographically, morphologically, syntactically, or semantically. 8. The system of claim 1 , wherein the context component generates the embedding of the one or more first terms via a neural network that employs a Continuous Bag of Words or Skip Gram methodology. 9. A computer-implemented method, comprising: generating, by a device operatively coupled to a processor, a context-based embedding of one or more first terms in a document collection, thereby yielding vector representations of the one or more first terms, wherein the one or more first terms correspond to knowledge terms in one or more first nodes of a knowledge graph; determining, by the device, that at least two of the one or more first terms that have vector representations that have a similarity value that meets a defined criterion have a hypernymy relation between them, and determine that at least two of the one or more first terms that have vector representations that fail to have a similarity value that meets the defined criterion do not have a hypernymy relation between them, wherein the vector representations are compared to determine whether the hypernymy relation from the knowledge graph for the at least two of the one or more first terms is spurious; and filtering out, by the device, a relationship between the one or more first nodes and a second node of the knowledge graph based on the similarity value. 10. The computer-implemented method of claim 9 , wherein: the similarity value is based on a cosine similarity between the vector representations of the one or more first terms and a vector representation of a second term in the document collection corresponding to the second node. 11. The computer-implemented method of claim 9 , wherein: the similarity value is based on average pairwise cosine similarities between the vector representations of the one or more first terms. 12. The computer-implemented method of claim 9 , wherein: the similarity value is based on cosine similarities between the vector representations of the one or more first terms and a vector representation of a prototypical term in the one or more first terms. 13. The computer-implemented method of claim 9 , wherein the knowledge terms are part of one or more labels, descriptions, definitions, or other text associated with the one or more first nodes. 14. The computer-implemented method of claim 9 , wherein the relationship is one from the group consisting of a hypernym-hyponym relation, a synonymy relation, an antonymy relation, an entailment relation, and a partonomy relation. 15. The computer-implemented method of claim 9 , wherein the one or more first terms correspond to the knowledge terms lexically, orthographically, morphologically, syntactically, or semantically. 16. The computer-implemented method of claim 9 , wherein the generating the embedding of the one or more first terms is performed with a neural network that employs a Continuous Bag of Words or Skip Gram methodology. 17. A computer program product for facilitating spurious relationship filtration, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processing component to cause the processing component to: generate a context-based embedding of one or more first terms in a document collection, thereby yielding vector representations of the one or more first terms, wherein the one or more first terms correspond to knowledge terms in one or more first nodes of a knowledge graph; determine that at least two of the one or more first terms that have vector representations that have a similarity value that meets a defined criterion have a supplier relation between them, and determine that at least two of the one or more first terms that have vector representations that fail to have a similarity value that meets the defined criterion do not have a supplier relation between them, wherein the vector representations are compared to determine whether the supplier relation from the knowledge graph for the at least two of the one or more first terms is spurious; and filter out a relationship between the one or more first nodes and a second node of the knowledge graph based on the similarity value. 18. The computer program product of claim 17 , wherein: the similarity value is based on a cosine similarity between the vector representations of the one or more first terms and a vector representation of a second term in the document collection corresponding to the second node. 19. The computer program product of claim 17 , wherein: the similarity value is based on average pairwise cosine similarities between the vector representations of the one or more first terms. 20. The computer program product of claim 17 , wherein: the similarity value is based on cosine similarities between the vector representations of the one or more first terms and a vector representation of a prototypical term in the one or more first terms.

Assignees

Inventors

Classifications

  • Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Named entity recognition · CPC title

  • Morphological analysis · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11080491B2 cover?
Systems and techniques that facilitate spurious relationship filtration from external knowledge graphs based on distributional semantics of an input corpus are provided. In one or more embodiments, a context component can generate a context-based word embedding of one or more first terms in a document collection. The embedding can yield vector representations of the one or more first terms. The…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 03 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).