Complex molecule substructure identification systems, apparatuses and methods
US-11854664-B2 · Dec 26, 2023 · US
US12068058B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12068058-B2 |
| Application number | US-201916973197-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 11, 2019 |
| Priority date | Jun 11, 2018 |
| Publication date | Aug 20, 2024 |
| Grant date | Aug 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention avoid the processing problems associated with using conventional computer systems for identifying and characterizing all of the substructures (e.g., metabolites) of large complex molecules by using a defined minimum cleavable unit (MCU) and an MCU graph for a chosen molecule, as well as a “cut vertex” in the MCU graph for the chosen molecule. The system splits the MCU graph of the chosen molecule at the specified cut vertex to produce two separate MCU graph components (i.e., a first MCU subgraph and a second MCU subgraph) of the chosen molecule, and generates and traverses a first line graph component and a second line graph component, respectively, for the two MCU subgraph components with a graph traversing algorithm to generate and store in memory a first database of substructures and molecular weights for the first component, and a second database of substructures and molecular weights for the second line graph component. Subsequently, embodiments of the present invention can perform binary searches on the two databases (or the two subsections of a single database) to identify and produce graphic representations of all of the substructures of the chosen molecule that have molecular weights that match the query molecular weight (or range of query molecular weights), including the substructures of the chosen molecule that straddle (i.e., include) the cut vertex.
Opening claim text (preview).
We claim: 1. A system for identifying substructures of a chosen molecule, the system comprising: a) a microprocessor; b) a memory; c) an application program, in the memory, comprising program instructions that, when executed by the microprocessor, will cause the microprocessor to (i) receive and store in the memory chosen molecule data representing (A) a set of minimum cleavable units (MCUs) in the chosen molecule, (B) a set of bonds connecting the set of MCUs in the chosen molecule, (C) molecular weights for each MCU, (D) a connectivity profile for the chosen molecule, the connectivity profile indicating relative positions of MCUs and bonds and connections therebetween, and (E) a cut vertex in the chosen molecule, wherein removal of the cut vertex separates the molecule into a first component and a second component, (ii) based on the chosen molecule data, create and store in the memory a first MCU graph data structure for the first component of the chosen molecule, the first MCU graph data structure being populated with first MCU graph data representing a first MCU graph for the first component, the first MCU graph having a plurality of first MCU graph vertices and a plurality of first MCU graph edges, each first MCU graph vertex corresponding to a MCU of the first component and each first MCU graph edge corresponding to a first bond connecting MCUs in the first component, (iii) based on the first MCU graph data, generate and store in the memory a first line graph (“LG”) data structure for the first component of the chosen molecule, the first LG data structure being populated with first LG data representing a first LG for the first MCU graph, the first LG having a plurality of first LG vertices and a plurality of first LG edges, each first LG vertex corresponding to a first MCU graph edge in the first MCU graph and each first LG edge corresponding to a pair of first MCU graph vertices in the first MCU graph that are connected together by said first MCU graph edge, (iv) execute a graph traversal algorithm against the first LG data in the first LG data structure for the first component of the chosen molecule to determine a plurality of first induced connected subgraphs (“ICSs”) for the first LG, each first ICS comprising a first connected subset of first LG vertices and first LG edges in the first LG, and a first physical arrangement of said first connected subset of first LG vertices and first LG edges, that together uniquely corresponds to a first connected subset of the set of MCUs and bonds, and the relative positions of said first connected subset of MCUs and bonds in the chosen molecule, (v) for each first ICS represented in the first LG data structure for the first component of the chosen molecule, create and store in a database a first ICS record comprising a first molecular weight field, a first vertex data field and a first edge data field, wherein the first vertex data field is populated with first vertex values configured to indicate a first vertex position for every first LG vertex in the first ICS, and the first edge data field is populated with first edge values configured to indicate the first edge position of every first LG edge in the first ICS relative to the first LG vertices, (vi) for each first ICS record in the first LG data structure for the first component of the chosen molecule, calculate and store in the first molecular weight field a first total molecular weight for the first ICS of that first ICS record based on the chosen molecule data for the chosen molecule and the first vertex values and the first edge values in the first ICS record, (vii) based on the chosen molecule data, create and store in the memory a second MCU graph data structure for the second component of the chosen molecule, the second MCU graph data structure being populated with second MCU graph data representing a second MCU graph for the second component, the second MCU graph having a plurality of second MCU graph vertices and a plurality of second MCU graph edges, each second MCU graph vertex corresponding to a MCU of the second component and each second MCU graph edge corresponding to a second bond connecting MCUs in the second component, (viii) based on the second MCU graph data, generate and store in the memory a second LG data structure for the second component of the chosen molecule, the second LG data structure being populated with second LG data representing a second LG for the second MCU graph, the second LG having a plurality of second LG vertices and a plurality of second LG edges, each second LG vertex corresponding to a second MCU graph edge in the second MCU graph and each second LG edge corresponding to a pair of second MCU graph vertices in the second MCU graph that are connected together by said second MCU graph edge, (ix) execute the graph traversal algorithm against the second LG data in the second LG data structure for the second component of the chosen molecule to determine a plurality of second ICSs for the second LG, each second ICS comprising a second connected subset of second LG vertices and second LG edges in the second LG, and a second physical arrangement of said second connected subset of second LG vertices and second LG edges, that together uniquely corresponds to a second connected subset of the set of MCUs and bonds, and the relative positions of said second connected subset of MCUs and bonds in the chosen molecule, (x) for each second ICS represented in the second LG data structure for the second component of the chosen molecule, create and store in the database a second ICS record comprising a second molecular weight field, a second vertex data field and a second edge data field, wherein the second vertex data field is populated with second vertex values configured to indicate a second vertex position for every second LG vertex in the second ICS, and the second edge data field is populated with second edge values configured to indicate the second edge position of every second LG edge in the second ICS relative to the second LG vertices, and (xi) for each second ICS record in the second LG data structure for the second component of the chosen molecule, calculate and store in the second molecular weight field a second total molecular weight for the second ICS of that second ICS record based on the chosen molecule data for the chosen molecule and the second vertex values and the second edge values in the second ICS record; and d) a user interface comprising program instructions that, when executed by the microprocessor, will cause the microprocessor to (i) receive a query molecular weight from an end user, (ii) search the database to identify a first ICS record having a first total molecular weight in the first molecular weight field that matches the query molecular weight, (iii) search the database to identify a second ICS record having a second total molecular weight in the second molecular weight field that matches the query molecular weight, (iv) use the first vertex values in the first vertex data field and the first edge values in the first edge data field of the identified first ICS records to produce and display on a display device a first graphical representation of the first ICS corresponding to the first ICS record having the first total molecular weight that matches the query molecular weight, (v) use the second vertex values in the second vertex data field and the second edge values in the second edge data field of the identified second ICS records to generate and display on the display device a second graphical representation of the second ICS corresponding to the second ICS record having the second total molecular weight that matches the query molecular weight, (vi) calculate an adjusted query molecular weight by subtracting a molecular weight for the cut vertex from the query molecular weight, (vii) identify, for the first component of the chosen mo
Search customisation based on user profiles and personalisation · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Presentation of query results · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.