Methods and systems for aligning sequences in the presence of repeating elements
US-2015199474-A1 · Jul 16, 2015 · US
US11649495B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11649495-B2 |
| Application number | US-202016798759-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 24, 2020 |
| Priority date | Sep 1, 2015 |
| Publication date | May 16, 2023 |
| Grant date | May 16, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention provides methods of analyzing an individual's mtDNA by transforming available reference sequences into a directed graph that compactly represents all the information without duplication and comparing sequence reads from the mtDNA to the graph to identify the individual or describe their mtDNA. A directed graph can represent all of the genetic variation found among the mitochondrial genomes across all of a number of reference organisms while providing a single article to which sequence reads can be aligned or compared. Thus any sequence read or other sequence fragment can be compared, in a single operation, to the article that represents all of the reference mitochondrial sequences.
Opening claim text (preview).
What is claimed is: 1. A method for analyzing a mitochondrial genome comprising a plurality of mitochondrial sequences, the method comprising: using at least one processor to perform: creating, in at least one non-transitory storage medium, a mitochondrial DNA (mtDNA) reference directed acyclic graph (DAG) representing at least some of the plurality of mitochondrial sequences, the mtDNA reference DAG comprising nodes and edges connecting the nodes, the nodes including a first node and a second node, wherein: the first node is stored as a first object in the at least one non-transitory storage medium, the first object comprising a first symbol string representing a first sequence of one or more nucleotides of the mitochondrial genome, the second node is stored as a second object in the at least one non-transitory storage medium, the second object comprising a second symbol string representing a second sequence of one or more nucleotides of the mitochondrial genome, and the first object further comprises a first list of one or more pointers to one or more adjacent objects stored in the at least one non-transitory storage medium, the first list of one or more pointers being stored in the at least one non-transitory storage medium and including a pointer to the second object; obtaining a plurality of sequence reads from a biological sample previously obtained from a subject; aligning one or more sequence reads of the plurality of sequence reads to the mtDNA reference DAG in the at least one non-transitory storage medium, at least in part by determining alignment scores between the one or more sequence reads and symbol strings associated with the nodes in the mtDNA reference DAG, wherein determining an alignment score for the second node comprises, for a first symbol in the second symbol string associated with the second node, determining the alignment score for the second node based on an alignment score associated with the first node, and wherein aligning a sequence read of the one or more sequence reads against the mtDNA reference DAG comprises: determining a first alignment score between a portion of the sequence read and a portion of the mtDNA reference DAG preceding and including a symbol in the second symbol string; and providing a report that identifies one or more of the plurality of mitochondrial sequences to which the one or more sequence reads of the plurality of sequence reads aligned. 2. The method of claim 1 , wherein the nodes of the mtDNA reference DAG further comprise a third node, wherein: the third node is stored as a third object in the at least one non-transitory storage medium, the third object comprising a third symbol string representing a third sequence of one or more nucleotides of the mitochondrial genome; and the third object further comprises a second list of one or more pointers to one or more adjacent objects, the second list of one or more pointers including a pointer to the second object. 3. The method of claim 2 , wherein the first symbol string associated with the first node and the third symbol string associated with the third node represent different sequences of one or more nucleotides for a common set of one or more positions in the mitochondrial genome. 4. The method of claim 3 , wherein determining the alignment score for the second node further comprises, for the first symbol in the second symbol string associated with the second node, determining the alignment score for the second node based on an alignment score associated with the third node. 5. The method of claim 1 , wherein the report identifies mitochondrial heteroplasmy in the subject. 6. The method of claim 1 , wherein the subject is an unknown subject and the report provides the identity of the subject. 7. The method of claim 1 , wherein the plurality of mitochondrial sequences are obtained from relatives of the subject. 8. The method of claim 1 , wherein the report describes a mutation in the mitochondrial genome of the subject. 9. The method of claim 1 , wherein the sequence reads correspond to at least a portion of a D-loop of the mitochondria of the subject. 10. The method of claim 1 , wherein aligning the one or more sequence reads to the mitochondrial sequences represented by the mtDNA reference DAG comprises a multi-dimensional look-back operation to find a trace through a multi-dimensional matrix based on a score for the trace. 11. The method of claim 1 , wherein creating the mtDNA reference DAG further comprises: obtaining each of the plurality of mitochondrial sequences; using the at least one processor to find portions of the mitochondrial sequences that match one another; creating, using the at least one processor, objects to represent the portions; and storing each of the objects in the at least one non-transitory storage medium. 12. The method of claim 1 , wherein each of the plurality of mitochondrial sequences represents at least 80% of the mitochondrial genome. 13. A method of detecting mitochondrial heteroplasmy in a mitochondrial genome of a subject, wherein the mitochondrial genome comprises a plurality of mitochondrial sequences, the method comprising: using at least one processor to perform: creating, in at least one non-transitory storage medium, a mitochondrial DNA (mtDNA) reference directed acyclic graph (DAG) representing at least some of the plurality of mitochondrial sequences, the mtDNA reference DAG comprising nodes and edges connecting the nodes, the nodes including a first node and a second node, wherein: the first node is stored as a first object in the at least one non-transitory storage medium, the first object comprising a first symbol string representing a first sequence of one or more nucleotides of the mitochondrial genome, the second node is stored as a second object in the at least one non-transitory storage medium, the second object comprising a second symbol string representing a second sequence of one or more nucleotides of the mitochondrial genome, and the first object further comprises a first list of one or more pointers to one or more adjacent objects stored in the at least one non-transitory storage medium, the first list of one or more pointers being stored in the at least one non-transitory storage medium and including a pointer to the second object; obtaining a plurality of sequence reads from a biological sample previously obtained from a subject; aligning one or more sequence reads of the plurality of sequence reads to the mtDNA reference DAG in the at least one non-transitory storage medium, at least in part by determining alignment scores between the one or more sequence reads and symbol strings associated with the nodes in the mtDNA reference DAG, wherein determining an alignment score for the second node comprises, for a first symbol in the second symbol string associated with the second node, determining the alignment score for the second node based on an alignment score associated with the first node, and wherein aligning a sequence read of the one or more sequence reads against the mtDNA reference DAG comprises: determining a first alignment score between a portion of the sequence read and a portion of the mtDNA reference DAG preceding and including a symbol in the second symbol string; and identifying, based on the aligned one or more sequence reads, at least one position in the mtDNA reference DAG in which the aligned one or more sequence reads align to different mitochondrial sequences. 14. The method of claim 13 , further comprising providing a report identifying mitochondrial heteroplasmy in the subject based on the identified at least one p
involving nucleic acid arrays, e.g. sequencing by hybridisation · CPC title
Sequence alignment; Homology search · CPC title
Polymorphic or mutational markers · CPC title
ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title
for detection or identification of organisms · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.