Methods, systems and non-transitory computer readable media for automated design of molecules with desired properties using artificial intelligence
US-2020168302-A1 · May 28, 2020 · US
US2023420084A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023420084-A1 |
| Application number | US-202118022805-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 25, 2021 |
| Priority date | Aug 26, 2020 |
| Publication date | Dec 28, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method of generating a molecule includes sequentially processing, by a memory network, each token in a token string representation of a molecular scaffold to generate a molecule, wherein the token string representation comprises a plurality of tokens representing predefined structures of the molecular scaffold and one or more tokens representing open positions of the molecular scaffold, and wherein the memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network. The method further includes outputting, from the memory network, a token string representation of the generated molecule. Sequentially processing each token in the token string representation of the molecular scaffold includes determining whether or not a current token being processed is a token representing an open position of the molecule.
Opening claim text (preview).
1 . A computer-implemented method of generating a molecule, the computer-implemented method comprising: sequentially processing, by a memory network, each token in a token string representation of a molecular scaffold to generate a molecule, wherein the token string representation comprises a plurality of tokens representing predefined structures of the molecular scaffold and one or more tokens representing open positions of the molecular scaffold, and wherein the memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network; and outputting, from the memory network, a token string representation of the generated molecule, wherein sequentially processing each token in the token string representation of the molecular scaffold comprises: determining whether a current token being processed is a token representing an open position of the molecule; and if the current token does not represent an open position of the molecular scaffold, reading the current token into the memory network and updating the internal state of the memory network based on the current token, or if the current token does represent an open position of the molecular scaffold, sampling one or more candidate tokens for the open position based on a current internal state of the memory network until a termination condition is satisfied and updating the internal state of the memory network after each sampled token based on the sampled token. 2 . The computer-implemented method of claim 1 , wherein the one or more tokens representing open positions respectively represent one or more different open position types, and wherein the termination condition for sampling one or both of the one or more candidate tokens at an open position and tokens available for sampling at an open position are dependent on an open position type of the open position. 3 . The computer-implemented method of claim 1 , further comprising resuming sequential processing of the token string representation of the molecular scaffold after the termination condition for an open position is satisfied. 4 . The computer-implemented method of claim 1 , wherein at least one of the tokens representing an open position of the molecular scaffold represents an open branch of the molecular scaffold. 5 . The computer-implemented method of claim 4 , wherein a set of allowed tokens for the token string representation comprises a branch-opening token and a branch-closing token, and wherein sampling one or more candidate tokens at an open position representing the open branch of the molecular scaffold comprises sampling candidate tokens until a number of sampled branch-closing tokens is equal to a number of branch-opening tokens in the candidate tokens. 6 . The computer-implemented method of claim 4 , wherein one or more tokens of the branch are constrained. 7 . The computer-implemented method of claim 1 , wherein at least one of the tokens representing an open position of the molecular scaffold represents an open linker within the molecular scaffold. 8 . The computer-implemented method of claim 7 , wherein sampling one or more candidate tokens at an open position representing an open linker of the molecular scaffold comprises: determining a threshold number of candidate tokens to be sampled based on a pre-defined probability distribution; and sampling candidate tokens until a number of sampled tokens reaches the threshold number of candidate tokens. 9 . The computer-implemented method of claim 8 , wherein a set of allowed tokens for the token string representation comprises one or both of a branch-opening token and a branch-closing token and one or more cycle opening and cycle closing tokens, and wherein sampling one or more candidate tokens at an open position representing an open linker of the molecular scaffold further comprises continuing to sample candidate tokens beyond the threshold number of sampled tokens until all branches are closed and all cycles are closed. 10 . The computer-implemented method of claim 1 , wherein the token string representation is a SMILES representation. 11 . The computer-implemented method of claim 1 , wherein the token string representation comprises a 1-letter or 3-letter amino acid representation. 12 . The computer-implemented method of claim 1 , further comprising iteratively: generating one or more molecular structures; scoring each molecular structure of the one or more molecular structures using a scoring function representing one or more target properties of a target molecule; storing the molecular structure as part of a set of potentially biologically or medically active molecules if a threshold condition relating to the scoring function is satisfied; and fine tuning parameters of the memory network based on the score for the molecular structure. 13 . The computer-implemented method of claim 12 , further comprising synthesizing the molecule based on the molecular structure. 14 . (canceled) 15 . A system comprising one or more processors and a memory, the memory comprising computer-readable instructions that, when executed by the one or more processors, causes the system to perform steps comprising: sequentially processing, by the memory, each token in a token string representation of a molecular scaffold to generate a molecule, wherein the token string representation comprises a plurality of tokens representing predefined structures of the molecular scaffold and one or more tokens representing open positions of the molecular scaffold, and wherein the memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network; and outputting, from the memory, a token string representation of the generated molecule, wherein sequentially processing each token in the token string representation of the molecular scaffold comprises: determining whether a current token being processed is a token representing an open position of the molecule; and if the current token does not represent an open position of the molecular scaffold, reading the current token into the memory network and updating the internal state of the memory network based on the current token, or if the current token does represent an open position of the molecular scaffold, sampling one or more candidate tokens for the open position based on a current internal state of the memory network until a termination condition is satisfied and updating the internal state of the memory network after each sampled token based on the sampled token. 16 . A drug generated according to a computer-implemented method, the computer-implemented method comprising: sequentially processing, by a memory network, each token in a token string representation of a molecular scaffold to generate a molecule, wherein the token string representation comprises a plurality of tokens representing predefined structures of the molecular scaffold and one or more tokens representing open positions of the molecular scaffold, and wherein the memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network; and outputting, from the memory network, a token string representation of the generated molecule, wherein sequentially processing each token in the token string representation of the molecular scaffold comprises: determining whether a current token being processed is a token representing an open position of the molecule; and if the current token does not represent an open position of the molecular scaffold, reading the current token into
Related publications grouped by family.
Answers are generated from the same data shown on this page.