Adversarial autoencoder architecture for methods of graph to sequence models
US-2023075100-A1 · Mar 9, 2023 · US
US2024428897A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024428897-A1 |
| Application number | US-202218274057-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 27, 2022 |
| Priority date | Jan 27, 2021 |
| Publication date | Dec 26, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for generating potential medicinal molecules using memory networks are descried. A method for generating analogs of a molecule includes: receiving one or more initial molecular structures; generating one or more of token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure. Generating the token string representations of analogs includes, for each further token string representation: sequentially processing a token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the token string representation of a substructure, sampling one or more additional tokens using the memory network. The token string representations each comprise a plurality of tokens representing predefined structures of a molecule. The memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network.
Opening claim text (preview).
1 .- 15 . (canceled) 16 . A computer implemented method of generating analogs of a molecule, the method comprising: receiving one or more initial molecular structures; generating one or more token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure, wherein generating the one or more token string representations for each initial molecular structure comprises: sequentially processing a substructure token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the substructure token string representation of the substructure, sampling one or more additional tokens using the memory network, wherein each of the one or more token string representations comprises a plurality of tokens representing predefined structures of a molecule; and wherein the memory network encodes a sequential probability distribution on the plurality of tokens using an internal state of the memory network. 17 . The method of claim 16 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position using a traversal rule to generate the token string representation of the corresponding initial molecular structure; and generating the substructure token string representation of a substructure of the corresponding initial molecular structure by taking a sequence of tokens from the token string representation of the corresponding initial molecular structure, wherein the sequence of tokens begins at a first token corresponding to the starting position. 18 . The method of claim 16 , wherein sequentially processing the substructure token string representation of a substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position for a plurality of steps using a traversal rule to generate a sequence of tokens corresponding to the substructure token string representations of a substructure of the corresponding initial molecular structure. 19 . The method of claim 18 , wherein a length of the sequence of tokens is selected from a pre-defined distribution over substructure sizes; wherein the traversal rule is selected from a plurality of valid traversal rules using a selection rule; and/or wherein the starting position is a randomly selected atom in the first molecular structure. 20 . The method of claim 16 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises sequentially reading the substructure token string representation of the substructure into the memory network, wherein the internal state of the memory network is updated after each token of the substructure token string representation of the substructure is read in; and wherein sampling one or more additional tokens using the memory network comprises updating the internal state of the memory network after each additional token is sampled. 21 . The method of claim 20 , wherein the one or more further tokens are sampled until an end of string token is sampled and/or a string size limit is reached. 22 . The method of claim 16 , wherein receiving one or more initial molecular structures comprises generating one or more token string representations of initial molecular structures using a memory network. 23 . The method of claim 16 , wherein the memory network is a memory neural network. 24 . The method of claim 23 , wherein the memory neural network is a recurrent neural network. 25 . A computer implemented method of generating potentially biologically or medically active molecules, the method comprising: generating a plurality of analogs of one or more initial molecular structures, comprising: receiving the one or more initial molecular structures; generating one or more token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure, wherein generating the one or more token string representations for each initial molecular structure comprises: sequentially processing a substructure token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the substructure token string representation of the substructure, sampling one or more additional tokens using the memory network, wherein each of the one or more token string representations comprises a plurality of tokens representing predefined structures of a molecule; and wherein the memory network encodes a sequential probability distribution on the plurality of tokens using an internal state of the memory network; determining a score for each of the generated initial molecular structures and analogs of the initial molecular structures using an objective function; updating parameters of the memory network based on one or more of the scores of the initial molecular structures and the analogs of the initial molecular structures; and outputting one or more token string representations of potentially biologically or medically active molecules based on the memory network with the updated parameters. 26 . The method of claim 25 , wherein updating parameters of the memory network based on one or more of the scores of the initial molecular structures and the analogs of the initial molecular structures comprises: generating an ordered list of molecular structures by ordering the initial molecular structures and the analogs of the initial molecular structures based on the determined scores; and updating parameters of the memory network based on a predefined number of highest-scoring molecular structures in the ordered list of molecular structures. 27 . The method of claim 25 , wherein the token string representations are SMILES representations or comprise 1-letter or 3-letter amino acid representations. 28 . The method of claim 25 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position using a traversal rule to generate the token string representation of the corresponding initial molecular structure; and generating the substructure token string representation of the substructure of the corresponding initial molecular structure by taking a sequence of tokens from the token string representation of the corresponding initial molecular structure, wherein the sequence of tokens begins at a first token corresponding to the starting position. 29 . The method of claim 25 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position for a plurality of steps using
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Reinforcement learning · CPC title
Supervised learning · CPC title
Generative networks · CPC title
Machine learning, data mining or chemometrics · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.