Systems and methods for drug design and discovery comprising applications of machine learning with differential geometric modeling

US2021027862A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021027862-A1
Application numberUS-201917043551-A
CountryUS
Kind codeA1
Filing dateApr 1, 2019
Priority dateMar 30, 2018
Publication dateJan 28, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Characteristics of molecules and/or biomolecular complexes may be predicted using differential geometry based methods in combination with trained machine learning models. Element specific and element interactive manifolds may be constructed using element interactive number density and/or element interactive charge density to represent the atoms or the charges in selected element sets. Feature data may include element interactive curvatures of various types derived from element specific and element interactive manifolds at various scales. Element interactive curvatures computed from various element interactive manifolds may be input to trained machine learning models, which may be derived from corresponding machine learning algorithms. These machine learning models may be trained to predict characteristics such as protein-protein or protein-ligand/protein/nucleic acid binding affinity, toxicity endpoints, free energy changes upon mutation, protein flexibility/rigidity/allosterism, membrane/globular protein mutation impacts, plasma protein binding, partition coefficient, permeability, clearance, and/or aqueous solubility, among others.

First claim

Opening claim text (preview).

1 . A system comprising: a non-transitory computer-readable memory; and a processor configured to execute instructions stored on the non-transitory computer-readable memory which, when executed, cause the processor to: identify a set of compounds based on one or more of a defined target clinical application, a set of desired characteristics, and a defined class of compounds; pre-process each compound of the set of compounds to generate respective sets of feature data; process the sets of feature data with one or more trained machine learning models to produce predicted characteristic values for each compound of the set of compounds for each of the set of desired characteristics, wherein the one or more trained machine learning models are selected based on at least the set of desired characteristics, wherein the sets of feature data comprise a first set of feature data comprising one or more element interactive curvatures; identify a subset of the set of compounds based on the predicted characteristic values; and display an ordered list of the subset of the set of compounds via an electronic display. 2 . The system of claim 1 , wherein the instructions, when executed, further cause the processor to: assign rankings to each compound of the set of compounds for each characteristic of the set of desired characteristics, wherein assigning a ranking to a given compound of the set of compounds for a given characteristic of the set of desired characteristics comprises: comparing a first predicted characteristic value of the predicted characteristic values corresponding to the given compound to other predicted characteristic values of other compounds of the set of compounds, wherein the ordered list is ordered according to the assigned rankings. 3 . The system of claim 1 , wherein the set of compounds includes protein-ligand complexes, and wherein the instructions, when executed, further cause the processor to, for a first protein-ligand complex of the protein-ligand complexes: determine an element interactive density for the first protein-ligand complex; identify a family of interactive manifolds for the first protein-ligand complex; determine an element interactive curvature based on the element interactive density; and generate a set of feature vectors based on the element interactive curvature, wherein the first set of feature data includes the set of feature vectors, wherein the one or more element interactive curvatures comprise the element interactive curvature, wherein the set of desired characteristics comprises protein binding affinity, wherein the one or more trained machine learning models comprise a machine learning model that is trained to predict protein binding affinity values based on the set of feature vectors, and wherein the predicted characteristic values comprise the predicted protein binding affinity values. 4 . The system of claim 1 , wherein the instructions, when executed, further cause the processor to: determine an element interactive density for a first compound of the set of compounds; identify a family of interactive manifolds for the first compound; determine an element interactive curvature based on the element interactive density; and generate a set of feature vectors based on the element interactive curvature, wherein the first set of feature data includes the set of feature vectors, wherein the one or more element interactive curvatures comprise the element interactive curvature, wherein the set of desired characteristics comprises one or more toxicity endpoints, wherein the one or more trained machine learning models comprise a machine learning model that is trained to output predicted toxicity endpoints values corresponding to the one or more toxicity endpoints based on the set of feature vectors, and wherein the predicted characteristic values comprise the predicted toxicity endpoint values. 5 . The system of claim 1 , wherein the instructions, when executed, further cause the processor to: determine an element interactive density for a first compound of the set of compounds; identify a family of interactive manifolds for the first compound; determine an element interactive curvature based on the element interactive density; and generate a set of feature vectors based on the element interactive curvature, wherein the one or more element interactive curvatures comprise the element interactive curvature, wherein the first set of feature data includes the set of feature vectors, wherein the set of desired characteristics comprises solvation free energy, wherein the one or more trained machine learning models comprise a machine learning model that is trained to output predicted solvation free energy values corresponding to a solvation free energy of the first compound based on the set of feature vectors, and wherein the predicted characteristic values comprise the predicted solvation free energy values. 6 . The system of claim 1 , wherein the one or more trained machine learning models are selected from a database of trained machine learning models, and wherein the one or more trained machine learning models comprises at least one trained machine learning model corresponding to a machine learning algorithm selected from the group comprising: a gradient boosted regression trees algorithm, a deep neural network, and a convolutional neural network. 7 . The system of claim 1 , wherein the one or more element interactive curvatures comprise at least one element interactive curvature selected from the group comprising: a Gaussian curvature, a mean curvature, a minimum curvature, and a maximum curvature. 8 . A method comprising: with a processor, identifying a set of compounds based on one or more of a defined target clinical application, a set of desired characteristics, and a defined class of compounds; with the processor, pre-processing each compound of the set of compounds to generate respective sets of feature data; with the processor, processing the sets of feature data with one or more trained machine learning models to produce predicted characteristic values for each compound of the set of compounds for each of the set of desired characteristics, wherein the one or more trained machine learning models are selected from a database of trained machine learning models based on at least the set of desired characteristics, wherein the sets of feature data comprise a first set of feature data comprising one or more element interactive curvatures; with the processor, identifying a subset of the set of compounds based on the predicted characteristic values; and with the processor, causing an ordered list of the subset of the set of compounds to be displayed via an electronic display. 9 . The method of claim 8 , further comprising: with the processor, assigning rankings to each compound of the set of compounds for each characteristic of the set of desired characteristics, wherein assigning a ranking to a given compound of the set of compounds for a given characteristic of the set of desired characteristics comprises: with the processor, comparing a first predicted characteristic value of the predicted characteristic values corresponding to the given compound to other predicted characteristic values of other compounds of the set of compounds, wherein the ordered list is ordered according to the assigned rankings. 10 . The method of claim 8 , wherein the set of compounds includes protein-ligand complexes, and wherein pre-processing each compound of the set of compounds to generate respective sets of feature data comprises: with the processor, determining an element interactive density for a first protein-ligand complex of the protein-ligand complexes; with the proc

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G16B40/20Primary

    Supervised data analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021027862A1 cover?
Characteristics of molecules and/or biomolecular complexes may be predicted using differential geometry based methods in combination with trained machine learning models. Element specific and element interactive manifolds may be constructed using element interactive number density and/or element interactive charge density to represent the atoms or the charges in selected element sets. Feature d…
Who is the assignee on this patent?
Univ Michigan State
What technology area does this patent fall under?
Primary CPC classification G16B40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).