Methods and systems for aligning sequences in the presence of repeating elements
US-2015199474-A1 · Jul 16, 2015 · US
US12365933B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12365933-B2 |
| Application number | US-202318329431-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 5, 2023 |
| Priority date | Aug 24, 2015 |
| Publication date | Jul 22, 2025 |
| Grant date | Jul 22, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention provides systems and methods for determining patterns of modification to a genome of a subject by representing the genome using a graph, such as a directed acyclic graph (DAG) with divergent paths for regions that are potentially subject to modification, profiling segments of the genome for evidence of epigenetic modification, and aligning the profiled segments to the DAG to determine locations and patterns of the epigenetic modification within the genome.
Opening claim text (preview).
What is claimed is: 1. A method for determining epigenetic modifications in a first sequence of nucleotide bases representing at least a portion of a genome of a subject, the first sequence having been previously-obtained by sequencing nucleic acid from the subject, the method comprising: using at least one processor to perform: accessing a graph stored in at least one non-transitory memory, the graph representing, at each of a plurality of positions in the graph, a respective cytosine base of a plurality of cytosine bases in the first sequence and a respective thymine base of a plurality of thymine bases not in the first sequence, the graph comprising nodes and edges stored as objects in the at least one non-transitory memory, at least some of the objects including respective pointers to other objects representing other nodes, the nodes including: a first node representing a cytosine base of the plurality of cytosine bases at a position of the plurality of positions, wherein the first node is stored as a first object in the at least one non-transitory memory, the first object comprising a first list of one or more pointers stored in the at least one non-transitory memory, and a second node representing a thymine base of the plurality of thymine bases at the position, wherein the second node is stored as a second object in the at least one non-transitory memory, the second object comprising a second list of one or more pointers stored in the at least one non-transitory memory; and aligning a second sequence of nucleotide bases to the graph to determine a proportion of a number of methylated cytosine bases to a total number of cytosine bases in at least the portion of the subject's genome, the second sequence representing at least the portion of the subject's genome and having been previously-obtained by sequencing bisulfite-treated nucleic acid from the subject, and the aligning comprising aligning the second sequence of nucleotide bases to the graph using (i) the objects including the first object and the second object, and (ii) the pointers including the first list of one or more pointers and the second list of one or more pointers. 2. The method of claim 1 , further comprising determining the total number of cytosine bases in at least the portion of the subject's genome, the determining comprising: determining a number of the plurality of cytosine bases in the first sequence. 3. The method of claim 2 , wherein: the second sequence comprises thymine bases, aligning the second sequence to the graph comprises aligning each of at least some of the thymine bases in the second sequence to a respective thymine base of the plurality of thymine bases not in the first sequence, and the method further comprises determining the number of methylated cytosine bases in at least the portion of the subject's genome at least in part by determining a number of the at least some of the thymine bases in the second sequence. 4. The method of claim 1 , further comprising: determining, based on the determined proportion of the number of methylated cytosine bases to the total number of cytosine bases in at least the portion of the subject's genome, whether transcription of a gene in the subject's genome has been regulated. 5. The method of claim 1 , further comprising creating the graph in the at least one non-transitory memory using the first sequence, the creating comprising: creating a first subset of the nodes of the graph, the first subset of the nodes including the first node and representing the first sequence; and creating a second subset of the nodes of the graph, the second subset of the nodes including the second node and representing the plurality of thymine bases not included in the first sequence. 6. The method of claim 1 , further comprising: identifying one or more variants in the first sequence of nucleotide bases. 7. The method of claim 1 , wherein the portion of the subject's genome is at least 50% of a length of a chromosome of the subject's genome. 8. The method of claim 1 , further comprising: treating the nucleic acid from the subject with bisulfite to obtain the bisulfite-treated nucleic acid; and sequencing the bisulfite-treated nucleic acid. 9. A system, comprising: at least one processor; and at least one non-transitory memory storing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform a method for determining epigenetic modifications in a first sequence of nucleotide bases representing at least a portion of a genome of a subject, the first sequence having been previously-obtained by sequencing nucleic acid from the subject, the method comprising: accessing a graph stored in the at least one non-transitory memory, the graph representing, at each of a plurality of positions in the graph, a respective cytosine base of a plurality of cytosine bases in the first sequence and a respective thymine base of a plurality of thymine bases not in the first sequence, the graph comprising nodes and edges stored as objects in the at least one non-transitory memory, at least some of the objects including respective pointers to other objects representing other nodes, the nodes including: a first node representing a cytosine base of the plurality of cytosine bases at a position of the plurality of positions, wherein the first node is stored as a first object in the at least one non-transitory memory, the first object comprising a first list of one or more pointers stored in the at least one non-transitory memory, and a second node representing a thymine base of the plurality of thymine bases at the position, wherein the second node is stored as a second object in the at least one non-transitory memory, the second object comprising a second list of one or more pointers stored in the at least one non-transitory memory; and aligning a second sequence of nucleotide bases to the graph to determine a proportion of a number of methylated cytosine bases to a total number of cytosine bases in at least the portion of the subject's genome, the second sequence representing at least the portion of the subject's genome and having been previously-obtained by sequencing bisulfite-treated nucleic acid from the subject, and the aligning comprising aligning the second sequence of nucleotide bases to the graph using (i) the objects including the first object and the second object, and (ii) the pointers including the first list of one or more pointers and the second list of one or more pointers. 10. The system of claim 9 , further comprising: determining, based on the determined proportion of the number of methylated cytosine bases to the total number of cytosine bases in at least the portion of the subject's genome, whether transcription of a gene in the subject's genome has been regulated. 11. The system of claim 9 , further comprising creating the graph in the at least one non-transitory memory using the first sequence, the creating comprising: creating a first subset of the nodes of the graph, the first subset of the nodes including the first node and representing the first sequence; and creating a second subset of the nodes of the graph, the second subset of the nodes including the second node and representing the plurality of thymine bases not included in the first sequence. 12. The system of claim 9 , wherein the portion of the subject's genome is at least 50% of a length of a chromosome of the subject's genome. 13. The system of claim 9 , further comprising: treating the nucleic acid from the subject with bisulfite to obtain the bisulfite-treated nucleic acid; and sequencing
Sequence alignment; Homology search · CPC title
Methods for sequencing · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
involving nucleic acid arrays, e.g. sequencing by hybridisation · CPC title
Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay (C12Q1/6804 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.