Complex molecule substructure identification systems, apparatuses and methods

US11854664B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11854664-B2
Application numberUS-201916973175-A
CountryUS
Kind codeB2
Filing dateJun 11, 2019
Priority dateJun 11, 2018
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide a computer-implemented system and method for generating and searching a database containing all of the potential substructures (e.g., metabolites) of a chosen complex molecule based on minimum cleavable units (MCUs) of the chosen complex molecule, wherein each record in the generated database suitably defines the molecular weight and physical arrangement of each substructure. Embodiments of the invention also provide a user interface and a search engine for searching the database based on a query molecular weight (or query molecular weight range) to identify all of the substructures having a total molecular weight matching the query molecular weight or range. Embodiments of the invention are also capable of transmitting to a display device operated by an end user a description and/or a graphical representation of every identified substructure of the chosen complex molecule.

First claim

Opening claim text (preview).

We claim: 1. A system for identifying substructures of a chosen molecule, the system comprising: a) a microprocessor; b) a memory; c) an application program, in the memory, comprising program instructions that, when executed by the microprocessor, will cause the microprocessor to (i) receive and store in the memory chosen molecule data representing (A) a set of minimum cleavable units in the chosen molecule, (B) a set of bonds connecting the set of minimum cleavable units in the chosen molecule, (C) molecular weights for each minimum cleavable unit, and (D) a connectivity profile for the chosen molecule, the connectivity profile indicating relative positions of minimum cleavable units and bonds and connections therebetween, (ii) based on the chosen molecule data, create and store in the memory a minimum cleavable unit graph data structure for the chosen molecule, the minimum cleavable unit graph data structure being populated with MCU graph data representing an MCU graph for the chosen molecule, the MCU graph having a plurality of MCU graph vertices and a plurality of MCU graph edges, each MCU graph vertex corresponding to a minimum cleavable unit of the chosen molecule and each MCU graph edge corresponding to a bond connecting minimum cleavable units in the chosen molecule, (iii) based on the MCU graph data, generate and store in the memory a line graph data structure, the line graph data structure being populated with line graph data representing a line graph for the MCU graph, the line graph having a plurality of LG vertices and a plurality of LG edges, each LG vertex corresponding to an MCU graph edge in the MCU graph and each LG edge corresponding to a pair of MCU graph vertices in the MCU graph that are connected together by said MCU graph edge, (iv) execute a graph traversal algorithm against the line graph data in the line graph data structure to determine a plurality of induced connected subgraphs for the line graph, each induced connected subgraph comprising a connected subset of LG vertices and LG edges in the line graph, and a physical arrangement of said connected subset of LG vertices and LG edges, that together uniquely corresponds to a connected subset of the set of minimum cleavable units and bonds, and the relative positions of said connected subset of minimum cleavable units and bonds in the chosen molecule, (v) for each induced connected subgraph represented in the line graph data structure, create in a database an ICS record comprising a molecular weight field, a vertex data field and an edge data field, wherein the vertex data field is populated with vertex values configured to indicate a vertex position for every LG vertex in the induced connected subgraph, and the edge data field is populated with edge values configured to indicate an edge position of every LG edge in the induced connected subgraph relative to the LG vertices, and (vi) for each ICS record in the line graph data structure, calculate and store in the molecular weight field a total molecular weight for the induced connected subgraph of that ICS record based on the chosen molecule data for the chosen molecule and the vertex values and the edge values in the ICS record; d) a user interface for communication with an end user; and e) program instructions in the user interface that, when executed by the microprocessor, will cause the microprocessor to (i) receive a query molecular weight from the end user, (ii) search the database based on the query molecular weight to identify an ICS record having a total molecular weight in the molecular weight field that matches the query molecular weight, and (iii) transmit the vertex values in the vertex data field and the edge values in the edge data field for the identified ICS record to the user interface for presentation on a display device operated by the end user. 2. The system according to claim 1 , further comprising program instructions in the user interface that, when executed by the microprocessor, will cause the microprocessor to: a) use the vertex values in the vertex data field, the edge values in the edge data field and the chosen molecule data to produce a graphical representation of an induced connected subgraph of the line graph; and b) transmit the graphical representation to the display device operated by the end user. 3. The system of claim 1 , further comprising program instructions in the application program that, when executed by the microprocessor, causes the microprocessor to c) receive a specified tolerance for the molecular weight; d) use the specified tolerance to calculate and define a range of molecular weights for the search of the database; e) search the database based on the query molecular weight and the range to identify each ICS record in the database that has a total molecular weight in the molecular weight field that falls within the range of molecular weights; and f) for said each identified ICS record, transmit the vertex values in the vertex data field and the edge values in the edge data field to the user interface for presentation to the end user. 4. The system according to claim 1 , wherein the chosen molecule data is received by parsing information stored in a linked list, or an array, or an adjacency matrix, or a graphic image file, or a chemical drawing file, or a spreadsheet file, or a text file, or a CSV file, or a .CDX file, or a .CDXML file, or a .MOL file, or a .SDM file, or a CAD file, or a binary data file. 5. The system according to claim 1 , wherein the connected subset of the set of minimum cleavable units and bonds is a metabolite of the chosen molecule, or a catabolite of the chose molecule, or a gas phase fragmentation of the chosen molecule, or a degradant of the chosen molecule, or a substructure of the chosen molecule. 6. The system according to claim 1 , wherein the graph traversal algorithm is a depth-first search algorithm, or a breadth-first search algorithm, or a reverse-search algorithm, or a tree-search algorithm, or a combination of two of more of the graph traversal algorithms recited herein. 7. The system according to claim 1 , wherein: a) the chosen molecule data includes elemental composition data representing (A) a set of elemental units in each minimum cleavable unit, (B) a set of elemental bonds connecting the set of elemental units in the minimum cleavable unit, (C) elemental molecular weights for each elemental unit, and (D) an MCU connectivity profile for the minimum cleavable unit, the MCU connectivity profile indicating relative positions of elemental units and elemental bonds in the minimum cleavable units and connections therebetween; and b) the ICS record created in the database further comprises an elemental unit field populated with one or more elemental unit identifiers; and c) the application program further includes program instructions that, when executed by the microprocessor, will cause the microprocessor to (i) receive a query elemental unit from the end user, (ii) search the database based on the query elemental unit to identify an ICS record having an elemental unit identifier in the elemental unit field that matches that matches the query elemental unit, and (iii) transmit the vertex values in the vertex data field and the edge values in the edge data field for the identified ICS record to the user interface for presentation on a display device operated by the end user. 8. A system for generating a database comprising substructures of a chosen molecule using a microprocessor, the system comprising: a) a memory; b) a microprocessor; c) an input module in the memory that, when executed by the microprocessor, causes the microprocessor to receive and store chosen molecule data representing (A) a set of minimum clea

Assignees

Inventors

Classifications

  • G16B15/00Primary

    ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment · CPC title

  • Query execution · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • G16C20/20Primary

    Identification of molecular entities, parts thereof or of chemical compositions · CPC title

  • Searching chemical structures or physicochemical data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854664B2 cover?
Embodiments of the present invention provide a computer-implemented system and method for generating and searching a database containing all of the potential substructures (e.g., metabolites) of a chosen complex molecule based on minimum cleavable units (MCUs) of the chosen complex molecule, wherein each record in the generated database suitably defines the molecular weight and physical arrange…
Who is the assignee on this patent?
Merck Sharp & Dohme, Merck Sharp & Dohme Llc
What technology area does this patent fall under?
Primary CPC classification G16B15/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).