Compound discovery via information divergence with knowledge graphs

US11789991B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11789991-B2
Application numberUS-201916256424-A
CountryUS
Kind codeB2
Filing dateJan 24, 2019
Priority dateJan 24, 2019
Publication dateOct 17, 2023
Grant dateOct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Complex computer system architectures are described for utilizing a knowledge data graph comprised of elements, and selecting a discovery element to replace an existing element of a formulation depicted in the knowledge data graph. The substitution process takes advantage of the knowledge data graph structure to improve the computing capabilities of a computing device executing a substitution calculation by translating the knowledge data graph into an embedding space, and determining a discovery element from within the embedding space.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a knowledge graph reception circuitry configured to receive an original knowledge graph including a set of structured data; a knowledge graph embedding circuitry configured to convert the original knowledge graph to an embedding space comprising nodes by converting the set of structured data into sets of vector triples and modelling the vector triples with a neural network architecture to learn the representations of the knowledge graph; a region slicing circuitry configured to: identify a selected node within the embedding space; determine a region of interest within the embedding space containing the selected node, and calculate a similarity score for each node within the region of interest, wherein the similarity score represents a similarity between each node within the region of interest and the selected node; calculate a weighted look up score for nodes within the region of interest, and select a predetermined number of candidate nodes having a highest weighted look up score; and calculate an information divergence score for the predetermined number of candidate nodes having the highest weighted look up score, wherein the information divergence score represents a divergence of a discovery candidate node from the selected node; and computation circuitry configured to: calculate a discovery score for at least one candidate node within the predetermined number of candidate nodes having the highest weighted look up score; and select a discovery node from the at least one candidate node according to a respective discovery score. 2. The system of claim 1 , wherein the at least one candidate node represents a compound included in a formulation. 3. The system of claim 1 , wherein the region slicing circuitry is configured to identify the selected node within the embedding space based on at least part of a received user query. 4. The system of claim 1 , wherein the region slicing circuitry is configured to determine the region of interest based on at least part of a received user query. 5. The system of claim 1 , wherein the computation circuitry is configured to calculate the discovery score by a weighted sum of the similarity score, the weighted look up score, and the information divergence score for a respective candidate node. 6. The system of claim 1 , wherein the region slicing circuitry is configured to determine the region of interest as including nodes within a predetermined vector distance from the selected node. 7. The system of claim 1 , wherein the computation circuitry is configured to select a candidate node having a highest discovery score as the discovery node. 8. The system of claim 1 , wherein the region slicing circuitry is configured to calculate the information divergence score by: determining a first link prediction probability for each node linked to the selected node based on at least part of a received user query; determining a second link prediction probability for each node linked to the nodes within the region of interest based on at least part of the received user query; and determining the information divergence score based on the first link prediction probability and the second link prediction probability. 9. The system of claim 1 , wherein the region slicing circuitry is configured to calculate the information divergence score using a Kullback-Leibler (KL) divergence technique. 10. A method comprising: receiving, by a knowledge graph reception circuitry, an original knowledge graph including a set of structured data; converting, by a knowledge graph embedding circuitry, the original knowledge graph to an embedding space comprising nodes by converting the set of structured data into sets of vector triples and modelling the vector triples with a neural network architecture to learn the representations of the knowledge graph; identifying, by a region slicing circuitry, a selected node within the embedding space; determining, by the region slicing circuitry, a region of interest within the embedding space containing the selected node, and calculating a similarity score for each node within the region of interest, wherein the similarity score represents that depicts a similarity between each node within the region of interest and the selected node; calculating, by the region slicing circuitry, a weighted look up score for nodes within the region of interest, and selecting a predetermined number of candidate nodes having a highest weighted look up score; and calculating, by the region slicing circuitry, an information divergence score for the predetermined number of candidate nodes having the highest weighted look up score, wherein the information divergence score represents a divergence of a discovery candidate node from the selected node; calculating, by a computation circuitry, a discovery score for at least one candidate node within the predetermined number of candidate nodes having the highest weighted look up score; and selecting, by the computation circuitry, a discovery node from the at least one candidate node according to a respective discovery score. 11. The method of claim 10 , wherein identifying, by the region slicing circuitry, the selected node within the embedding space is based on at least part of a received user query. 12. The method of claim 10 , wherein determining the region of interest comprises including nodes within a predetermined vector distance from the selected node, wherein the predetermined vector distance is included in a received user query. 13. The method of claim 10 , wherein calculating, by the computation circuitry, the discovery score comprises calculating a weighted sum of the similarity score, the weighted look up score, and the information divergence score for a respective candidate node. 14. The method of claim 10 , wherein calculating the information divergence score comprises: determining a first link prediction probability for each node linked to the selected node based on at least part of a received user query; determining a second link prediction probability for each node linked to the nodes within the region of interest based on at least part of the received user query; and determining the information divergence score based on the first link prediction probability and the second link prediction probability. 15. The method of claim 10 , wherein selecting, by the computation circuitry, the discovery node comprises selecting a candidate node having a highest substitution score as the discovery node. 16. A product comprising: a machine-readable medium, other than a transitory signal; and instructions stored on the machine-readable medium, the instructions configured to, when executed, cause processing circuitry to: receive an original knowledge graph including a set of structured data; convert the original knowledge graph to an embedding space comprising nodes by converting the set of structured data into sets of vector triples and modelling the vector triples with a neural network architecture to learn the representations of the knowledge graph; identify a selected node within the embedding space for substitution; determine a region of interest within the embedding space containing the selected node, and calculating a similarity score for each node within the region of interest wherein the similarity score represents a similarity between each node within the region of interest and the selected node; calculate a weighted look up score for nodes within the region of interest, and selecting a predetermined number of candidate nodes having a highest weigh

Assignees

Inventors

Classifications

  • G06F16/36Primary

    Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • with adaptation to user needs · CPC title

  • for solving equations {, e.g. nonlinear equations, general mathematical optimization problems (optimization specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11789991B2 cover?
Complex computer system architectures are described for utilizing a knowledge data graph comprised of elements, and selecting a discovery element to replace an existing element of a formulation depicted in the knowledge data graph. The substitution process takes advantage of the knowledge data graph structure to improve the computing capabilities of a computing device executing a substitution c…
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/36. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).