Generative machine learning systems for drug design

US2017161635A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2017161635-A1
Application numberUS-201615015044-A
CountryUS
Kind codeA1
Filing dateFeb 3, 2016
Priority dateDec 2, 2015
Publication dateJun 8, 2017
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, the systems and methods described herein relate to generative models. The generative models may be trained using machine learning approaches, with training sets comprising chemical compounds and biological or chemical information that relate to the chemical compounds. Deep learning architectures may be used. In various embodiments, the generative models are used to generate chemical compounds that have desired characteristics, e.g. activity against a selected target. The generative models may be used to generate chemical compounds that satisfy multiple requirements.

First claim

Opening claim text (preview).

1 . A computer system for generation of chemical compound representations, the system comprising: (i) a probabilistic autoencoder comprising; (1) a probabilistic encoder configured to encode chemical compound fingerprints as latent variables; (2) a probabilistic decoder configured to decode latent representations and to generate random variables over values of fingerprint elements; and (3) one or more sampling modules configured to sample from a latent variable or a random variable; wherein the system is trained by feeding it chemical compound fingerprints and training labels associated with the chemical compound fingerprints and generating reconstructions of chemical compound fingerprints, wherein the system's training is constrained by a reconstruction error. 2 . The computer system of claim 1 , wherein the training labels comprise one or more label elements having predetermined values. 3 . The computer system of claim 1 , wherein the system is configured to receive a target label comprising one or more label elements and generate chemical compound fingerprints that satisfy a specified value for each of the one or more label elements. 4 . The computer system of claim 3 , wherein the training labels do not comprise the target label. 5 . The computer system of claim 1 , wherein each chemical compound fingerprint uniquely identifies a chemical compound. 6 . The computer system of claim 1 , wherein the training further constrains the total information flow between the probabilistic encoder and the probabilistic decoder. 7 . The computer system of claim 1 , wherein the probabilistic encoder is configured to provide an output comprising a pair of a vector of means and a vector of standard deviations. 8 . The computer system of claim 7 , wherein the sampling module is configured to receive the output of the encoder, define the latent variable based on the output of the encoder, and generate one or more latent representations, wherein the latent variable is modeled by a probability distribution. 9 . The computer system of claim 8 , wherein the probability distribution is selected from the group consisting of Normal distribution, Laplace distribution, Elliptical distribution, Student's t distribution, Logistic distribution, Uniform distribution, Triangular distribution, Exponential distribution, Invertible cumulative distribution, Cauchy distribution, Rayleigh distribution, Pareto distribution, Waybill distribution, Reciprocal distribution, Gompertz distribution, Gumbel distribution, Erlan distribution, Logarithmic Normal distribution, Gamma distribution, Dirichlet distribution, Beta distribution, Chi-Squared distribution, F distribution, and variations thereof. 10 . The computer system of claim 1 , wherein the probabilistic encoder comprises an inference model. 11 . The computer system of claim 9 , wherein the inference model comprises a multi-layer perceptron. 12 . The computer system of claim 1 , wherein the probabilistic autoencoder comprises a generative model. 13 . (canceled) 14 . The computer system of claim 1 , further comprising a predictor that is configured to predict values of selected label elements for chemical compound fingerprints. 15 . The computer system of claim 2 , wherein the label comprises one or more label elements selected from the group consisting of bioassay results, toxicity, cross-reactivity, pharmacokinetics, pharmacodynamics, bioavailability, and solubility. 16 . A training method for generation of chemical compound representations, the training method comprising training a generative model, the training comprising: (i) inputting to the generative model chemical compound fingerprints and associated training labels, and (ii) generating reconstructions of chemical compound fingerprints; wherein the generative model comprises a probabilistic autoencoder comprising (1) a probabilistic encoder configured to encode chemical compound fingerprints as latent variables; (2) a probabilistic decoder configured to decode latent representations as random variables over values of fingerprint elements; and (3) a sampling module configured to sample from the latent variables to generate latent representations or from a random variable to generate a reconstruction of a fingerprint; and wherein the training labels comprise one or more label elements having empirical or predicted values, wherein the system's training is constrained by a reconstruction error. 17 . A computer system for drug prediction, the system comprising (i) a machine learning model comprising a generative model; wherein the generative model is trained with a training data set comprising chemical compound fingerprint data and associated training labels comprising one or more label elements. 18 . The system of claim 17 , wherein the generative model comprises a neural network having at least 6 layers of units. 19 . The system of claim 17 , wherein the label elements comprise one or more elements selected from the group consisting of bioassay results, toxicity, cross-reactivity, pharmacokinetics, pharmacodynamics, bioavailability, and solubility. 20 . The system of claim 17 , wherein the generative model comprises a probabilistic autoencoder. 21 . The system of claim 17 , wherein the generative model comprises a probabilistic or variational autoencoder having a probabilistic encoder, a probabilistic decoder, and a sampling module. 22 - 84 . (canceled)

Assignees

Inventors

Classifications

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Combinations of networks · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2017161635A1 cover?
In various embodiments, the systems and methods described herein relate to generative models. The generative models may be trained using machine learning approaches, with training sets comprising chemical compounds and biological or chemical information that relate to the chemical compounds. Deep learning architectures may be used. In various embodiments, the generative models are used to gener…
Who is the assignee on this patent?
Preferred Networks Inc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 08 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).