Generative machine learning systems for drug design

US10776712B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10776712-B2
Application numberUS-201615015044-A
CountryUS
Kind codeB2
Filing dateFeb 3, 2016
Priority dateDec 2, 2015
Publication dateSep 15, 2020
Grant dateSep 15, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, the systems and methods described herein relate to generative models. The generative models may be trained using machine learning approaches, with training sets comprising chemical compounds and biological or chemical information that relate to the chemical compounds. Deep learning architectures may be used. In various embodiments, the generative models are used to generate chemical compounds that have desired characteristics, e.g. activity against a selected target. The generative models may be used to generate chemical compounds that satisfy multiple requirements.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system for generation of representations obtained from inputs, the system comprising: (i) an autoencoder comprising: an encoder including a neural network and a decoder including a neural network, wherein (1) the encoder is configured to encode the inputs as latent variables, the inputs being information about chemical compounds, and (2) the decoder is configured to decode information based on the latent variables and to output random variables, wherein the system is trained by causing the encoder to encode both the inputs and training labels associated with the inputs and by causing the decoder to generate reconstructions of the inputs, wherein the system's training is constrained by a reconstruction error, and wherein the inputs and the training labels are fed into, and encoded by, the encoder to cause the autoencoder to model a joint probability distribution with respect to the inputs and the training labels during the system's training. 2. The computer system of claim 1 , wherein the training labels comprise one or more label elements having predetermined values. 3. The computer system of claim 1 , wherein the inputs include chemical compound fingerprints, and the system is configured to receive a target label comprising one or more label elements and generate chemical compound fingerprints that satisfy a specified value for each of the one or more label elements. 4. The computer system of claim 3 , wherein the training labels do not comprise the target label. 5. The computer system of claim 1 , wherein the inputs include chemical compound fingerprints, and each chemical compound fingerprint uniquely identifies a chemical compound. 6. The computer system of claim 1 , wherein the training further constrains the total information flow between the encoder and the decoder. 7. The computer system of claim 1 , wherein the encoder comprises a probabilistic encoder configured to provide an output comprising a pair of a vector of means and a vector of standard deviations. 8. The computer system of claim 7 , further comprising a sampling module, wherein the sampling module is configured to receive the output of the encoder, define one of the latent variables based on the output of the encoder, and generate the information based on the latent variables, wherein the one of the latent variables is modeled by a probability distribution. 9. The computer system of claim 8 , wherein the probability distribution is selected from the group consisting of Normal distribution, Laplace distribution, Elliptical distribution, Student's t distribution, Logistic distribution, Unifouii distribution, Triangular distribution, Exponential distribution, Invertible cumulative distribution, Cauchy distribution, Rayleigh distribution, Pareto distribution, Waybill distribution, Reciprocal distribution, Gompertz distribution, Gumbel distribution, Erlan distribution, Logarithmic Normal distribution, Gamma distribution, Dirichlet distribution, Beta distribution, Chi-Squared distribution, F distribution. 10. The computer system of claim 1 , wherein the encoder comprises an inference model. 11. The computer system of claim 10 , wherein the inference model comprises a multi-layer perceptron. 12. The computer system of claim 1 , wherein the autoencoder comprises a generative model. 13. The computer system of claim 1 , further comprising a predictor that is configured to predict values of selected label elements for inputs. 14. The computer system of claim 2 , wherein the training labels comprise one or more label elements selected from the group consisting of bioassay results, toxicity, cross-reactivity, pharmacokinetics, pharmacodynamics, bioavailability, and solubility. 15. A computer-implemented training method for generation of representations obtained from inputs, the training method comprising training a generative model, the training comprising: (i) inputting to the generative model both inputs and associated training labels, the inputs being information about chemical compounds, and (ii) generating reconstructions of the inputs; wherein the generative model comprises an autoencoder comprising an encoder including a neural network and a decoder including a neural network, wherein (1) the encoder is configured to encode the inputs and the training labels as latent variables; and (2) the decoder is configured to decode information based on the latent variables as random variables, wherein the training is constrained by a reconstruction error, and wherein the inputs and the training labels are fed into, and encoded by, the encoder to cause the autoencoder to model a joint probability distribution with respect to the inputs and the training labels during the training. 16. A computer system for drug prediction, the system comprising: (i) a machine learning model comprising a generative model including one or more neural networks; wherein the generative model is trained with a training data set comprising input data and associated training labels comprising one or more label elements, the input data being information about chemical compounds, wherein the generative model is trained by feeding the input data and the training labels as input into an encoder and by feeding an output of the encoder and the training labels, supplied from other than the encoder, into a decoder for the training of the generative model, the encoder and the decoder constituting an autoencoder that is trained by a reconstruction error, and wherein the input data and the training labels are fed into, and encoded by, the encoder to cause the generative model to model a joint probability distribution with respect to the input data and the training labels during the training of the generative model. 17. The system of claim 16 , wherein the label elements comprise one or more elements selected from the group consisting of bioassay results, toxicity, cross-reactivity, pharmacokinetics, pharmacodynamics, bioavailability, and solubility. 18. The system of claim 16 , wherein the generative model comprises a probabilistic autoencoder. 19. The system of claim 16 , wherein the generative model comprises a probabilistic or variational autoencoder. 20. A computer system for generation of a chemical compound representation, the computer system comprising: a decoder implemented as a neural network, the decoder being a generative model configured to receive a latent representation and a label to generate a random variable corresponding to the chemical compound representation, wherein the decoder is trained by causing an encoder implemented as a neural network to encode labels and information about chemical compounds to generate latent variables for providing latent representations, and by causing the decoder to decode the latent representations and the labels to generate random variables corresponding to reconstructions of the information about the chemical compounds, while the training of the decoder is constrained by a reconstruction error, and wherein the labels and the information about the chemical compounds are fed into, and encoded by, the encoder to cause the encoder and the decoder to model a joint probability distribution with respect to the labels and the information about the chemical compounds during the training. 21. The computer system as claimed in claim 20 , wherein the labels comprise one or more label elements selected from the group consisting of bioassay results, toxicity, cross-reactivity, pharmacokin

Assignees

Inventors

Classifications

  • Probabilistic or stochastic networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10776712B2 cover?
In various embodiments, the systems and methods described herein relate to generative models. The generative models may be trained using machine learning approaches, with training sets comprising chemical compounds and biological or chemical information that relate to the chemical compounds. Deep learning architectures may be used. In various embodiments, the generative models are used to gener…
Who is the assignee on this patent?
Preferred Networks Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).