Molecular Design Using Local Exploration

US2024428897A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024428897-A1
Application numberUS-202218274057-A
CountryUS
Kind codeA1
Filing dateJan 27, 2022
Priority dateJan 27, 2021
Publication dateDec 26, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating potential medicinal molecules using memory networks are descried. A method for generating analogs of a molecule includes: receiving one or more initial molecular structures; generating one or more of token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure. Generating the token string representations of analogs includes, for each further token string representation: sequentially processing a token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the token string representation of a substructure, sampling one or more additional tokens using the memory network. The token string representations each comprise a plurality of tokens representing predefined structures of a molecule. The memory network encodes a sequential probability distribution on the tokens using an internal state of the memory network.

First claim

Opening claim text (preview).

1 .- 15 . (canceled) 16 . A computer implemented method of generating analogs of a molecule, the method comprising: receiving one or more initial molecular structures; generating one or more token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure, wherein generating the one or more token string representations for each initial molecular structure comprises: sequentially processing a substructure token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the substructure token string representation of the substructure, sampling one or more additional tokens using the memory network, wherein each of the one or more token string representations comprises a plurality of tokens representing predefined structures of a molecule; and wherein the memory network encodes a sequential probability distribution on the plurality of tokens using an internal state of the memory network. 17 . The method of claim 16 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position using a traversal rule to generate the token string representation of the corresponding initial molecular structure; and generating the substructure token string representation of a substructure of the corresponding initial molecular structure by taking a sequence of tokens from the token string representation of the corresponding initial molecular structure, wherein the sequence of tokens begins at a first token corresponding to the starting position. 18 . The method of claim 16 , wherein sequentially processing the substructure token string representation of a substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position for a plurality of steps using a traversal rule to generate a sequence of tokens corresponding to the substructure token string representations of a substructure of the corresponding initial molecular structure. 19 . The method of claim 18 , wherein a length of the sequence of tokens is selected from a pre-defined distribution over substructure sizes; wherein the traversal rule is selected from a plurality of valid traversal rules using a selection rule; and/or wherein the starting position is a randomly selected atom in the first molecular structure. 20 . The method of claim 16 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises sequentially reading the substructure token string representation of the substructure into the memory network, wherein the internal state of the memory network is updated after each token of the substructure token string representation of the substructure is read in; and wherein sampling one or more additional tokens using the memory network comprises updating the internal state of the memory network after each additional token is sampled. 21 . The method of claim 20 , wherein the one or more further tokens are sampled until an end of string token is sampled and/or a string size limit is reached. 22 . The method of claim 16 , wherein receiving one or more initial molecular structures comprises generating one or more token string representations of initial molecular structures using a memory network. 23 . The method of claim 16 , wherein the memory network is a memory neural network. 24 . The method of claim 23 , wherein the memory neural network is a recurrent neural network. 25 . A computer implemented method of generating potentially biologically or medically active molecules, the method comprising: generating a plurality of analogs of one or more initial molecular structures, comprising: receiving the one or more initial molecular structures; generating one or more token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a corresponding initial molecular structure, wherein generating the one or more token string representations for each initial molecular structure comprises: sequentially processing a substructure token string representation of a substructure of the corresponding initial molecular structure using a memory network; and subsequent to processing the substructure token string representation of the substructure, sampling one or more additional tokens using the memory network, wherein each of the one or more token string representations comprises a plurality of tokens representing predefined structures of a molecule; and wherein the memory network encodes a sequential probability distribution on the plurality of tokens using an internal state of the memory network; determining a score for each of the generated initial molecular structures and analogs of the initial molecular structures using an objective function; updating parameters of the memory network based on one or more of the scores of the initial molecular structures and the analogs of the initial molecular structures; and outputting one or more token string representations of potentially biologically or medically active molecules based on the memory network with the updated parameters. 26 . The method of claim 25 , wherein updating parameters of the memory network based on one or more of the scores of the initial molecular structures and the analogs of the initial molecular structures comprises: generating an ordered list of molecular structures by ordering the initial molecular structures and the analogs of the initial molecular structures based on the determined scores; and updating parameters of the memory network based on a predefined number of highest-scoring molecular structures in the ordered list of molecular structures. 27 . The method of claim 25 , wherein the token string representations are SMILES representations or comprise 1-letter or 3-letter amino acid representations. 28 . The method of claim 25 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position using a traversal rule to generate the token string representation of the corresponding initial molecular structure; and generating the substructure token string representation of the substructure of the corresponding initial molecular structure by taking a sequence of tokens from the token string representation of the corresponding initial molecular structure, wherein the sequence of tokens begins at a first token corresponding to the starting position. 29 . The method of claim 25 , wherein sequentially processing the substructure token string representation of the substructure of the corresponding initial molecular structure comprises: selecting a starting position in the corresponding initial molecular structure; traversing the corresponding initial molecular structure from the starting position for a plurality of steps using

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Reinforcement learning · CPC title

  • Supervised learning · CPC title

  • Generative networks · CPC title

  • Machine learning, data mining or chemometrics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024428897A1 cover?
Systems and methods for generating potential medicinal molecules using memory networks are descried. A method for generating analogs of a molecule includes: receiving one or more initial molecular structures; generating one or more of token string representations for each of the one or more initial molecular structures, each token string representation corresponding to an analog of a correspond…
Who is the assignee on this patent?
Sanofi Sa, Ecole Normale Superieure, Univ Sorbonne, and 1 more
What technology area does this patent fall under?
Primary CPC classification G16C20/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).