Systems and methods for drug design and discovery comprising applications of machine learning with differential geometric modeling
US-2021027862-A1 · Jan 28, 2021 · US
US2021287762A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021287762-A1 |
| Application number | US-202117200836-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 14, 2021 |
| Priority date | Mar 16, 2020 |
| Publication date | Sep 16, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for finding similar molecules to a query molecule includes a GCN, a PFS vector extractor, a compensated vector comparator (CVC) and a candidate vector selector. The GCN has been trained to output a molecular property vector from an input query or input candidate molecular vectors, respectively, The GCN transforms query atomic feature set (AFS) vectors and candidate AFS vectors into query property feature set (PFS) embedding vectors and candidate PFS embedding vectors. The PFS vector extractor extracts query PFS embedding vectors and candidate PFS embedding vectors from hidden layers of the trained GCN. The compensated vector comparator (CVC) calculates a compensated similarity metric (CSM) for at least one pair of query PFS embedding vector and one candidate PFS embedding vector. The candidate vector selector selects only such candidate molecular vectors.
Opening claim text (preview).
What is claimed is: 1 . A method for finding similar molecules to a query molecule, the method comprising: transforming query atomic feature set (AFS) vectors and candidate AFS vectors into query property feature set (PFS) embedding vectors and candidate PFS embedding vectors, utilizing a GCN that has been trained to output a molecular property vector from an input query or input candidate molecular vectors, respectively; extracting query and candidate PFS embedding vectors from hidden layers of said trained GCN; calculating a compensated similarity metric (CSM) for at least one pair of said query PFS embedding vector and one said candidate PFS embedding vector; and selecting only such said candidate molecular vectors which have a value of said CSM above a pre-defined threshold value. 2 . The method according to claim 1 wherein said compensating attempts to compensate for inaccuracies caused by a varying position of said atomic feature sets at an input layer of said trained GCN. 3 . The method according to claim 1 wherein said calculating comprises: for each candidate PFS embedding vector: summing all possible combinations of dot products between property feature sets in said query PFS embedding vector and property feature sets in said candidate PFS embedding vector; and normalizing said dot product sum, by dividing said dot product sum by the number of said property feature sets in said candidate PFS embedding vector. 4 . The method according to claim 1 wherein said trained GCN comprises an input layer, four hidden layers and an output layer. 5 . The method according to claim 1 wherein each said PFS embedding vector comprises a plurality of property feature sets. 6 . The method according to claim 1 wherein said trained GCN is trained to one of the following properties: solubility, blood brain barrier and toxicity. 7 . The method according to claim 4 wherein said extracting query and candidate PFS embedding vectors is performed at the output of the fourth said hidden layer. 8 . The method according to claim 1 wherein said candidate AFS vectors are vectors used to train said GCN. 9 . The method according to claim 1 wherein adjusting said predefined threshold value changes the number of said candidate molecular vectors deemed similar to said query molecular vector. 10 . A system for finding similar molecules to a query molecule, the system comprising: a GCN that has been trained to output a molecular property vector from an input query or input candidate molecular vectors, respectively, to transform query atomic feature set (AFS) vectors and candidate AFS vectors into query property feature set (PFS) embedding vectors and candidate PFS embedding vectors; a PFS vector extractor to extract query PFS embedding vectors and candidate PFS embedding vectors from hidden layers of said trained GCN; a compensated vector comparator (CVC) to calculate a compensated similarity metric (CSM) for at least one pair of said query PFS embedding vector and one said candidate PFS embedding vector; and a candidate vector selector to select only such said candidate molecular vectors which have a value of said CSM above a pre-defined threshold value. 11 . The system according to claim 10 wherein said compensated vector comparator (CVC) attempts to compensate for inaccuracies caused by a varying position of said atomic feature sets at an input layer of said trained GCN. 12 . The system according to claim 11 wherein said CVC comprises: a dot product summer to sum all possible combinations of dot products between property feature sets in said query PFS embedding vector and property feature sets in said candidate PFS embedding vector, for each candidate PFS embedding vector; and a DPS normalizer to normalize said DPS, by dividing said DPS by the number of said property feature sets in said candidate PFS embedding vector, for each candidate PFS embedding vector. 13 . The system according to claim 10 wherein said trained GCN comprises an input layer, four hidden layers and an output layer. 14 . The system according to claim 10 wherein each said PFS embedding vector comprises a plurality of property feature sets. 15 . The system according to claim 10 wherein said trained GCN is trained to one of the following properties: solubility, blood brain barrier and toxicity. 16 . The system according to claim 13 wherein said PFS vector extractor extracts query and candidate PFS embedding vectors from the output of the fourth said hidden layer. 17 . The system according to claim 10 wherein said candidate AFS vectors are vectors used to train said GCN. 18 . The system according to claim 10 wherein said candidate vector selector to change the value of said predefined threshold value in order to change the number of said candidate molecular vectors deemed similar to said query molecular vector.
Combinations of networks · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.