Systems and Methods for Correcting for Noise and Systemic Variations in Sequencing Data
US-2024404627-A1 · Dec 5, 2024 · US
US2022215901A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022215901-A1 |
| Application number | US-202017613805-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 26, 2020 |
| Priority date | May 28, 2019 |
| Publication date | Jul 7, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure describes a sequencing system configured to identify structural variants in mitochondrial DNA. Variant callers configured to identify variants in linear genomes (e.g., those found in chromosomes) can fail to properly identify structural variants in mitochondrial DNA. The system and methods can identify structural variants in next generation sequencing data collected from circular, mitochondrial DNA.
Opening claim text (preview).
What is claimed: 1 . A method to identify variants in mitochondrial sequencing data, comprising: receiving a plurality of sequence reads comprising an indication of sequenced DNA samples; identifying a subset of the plurality of sequence reads, wherein each of the subsets of the plurality of sequence reads partially mapped to a target mitochondrial DNA sample; generating a plurality of query sequences based on each of the subsets of the plurality of sequence reads that are partially mapped to the target mitochondrial DNA sample; calculating a score for each of the subsets of the plurality of sequence reads based on an alignment of the plurality of query sequences with the target mitochondrial DNA sample; selecting a plurality of test reads, wherein the plurality of test reads comprises the subset of the plurality of sequence reads having a score below a predetermined threshold; identifying a breakpoint for each of the plurality of test reads; determining a count of the plurality of test reads having the breakpoint at a predetermined location; and identifying a sequence variant based on the count of the plurality of test reads having the breakpoint at the predetermined location. 2 . The method of claim 1 , wherein the DNA samples are pair-end sequenced. 3 . The method of claim 1 , wherein the sequence variant is one of a deletion, insertion, duplication, or inversion. 4 . The method of claim 1 , further comprising generate a plurality of sequence words from the plurality of query sequences. 5 . The method of claim 4 , further comprising aligning each of the plurality of sequence words from the plurality of query sequences to the target mitochondrial DNA sample. 6 . The method of claim 1 , wherein the score is an e-value indicating a probability of the alignment of each plurality of query sequences occurring by chance. 7 . The method of claim 1 , wherein identifying the breakpoint for one of the plurality of test reads comprises: determining a distance between a first sequence in the one of the plurality of test reads and a second sequence in one of the plurality of test reads is one nucleotide; and determining that a length of a deletion between a first location of the first sequence in the target mitochondrial DNA and a second location of the second sequence in the target mitochondrial DNA is greater than one nucleotide. 8 . The method of claim 1 , wherein identifying the breakpoints for one of the plurality of test reads comprises: determining a distance between a first location of a first sequence in the target mitochondrial DNA and a second location of a second sequence in the target mitochondrial DNA is one nucleotide; and determining a length of an insertion between the first sequence in the one of the plurality of test reads and the second sequence in the one of the plurality of test reads. 9 . The method of claim 1 , wherein identifying the breakpoints for one of the plurality of test reads comprises: determining a distance between a first sequence in the one of the plurality of test reads and a second sequence in the one of the plurality of test reads is one nucleotide; and determining that a length of a duplication between an end location of the first sequence in the target mitochondrial DNA and a start location of the second sequence in the target mitochondrial DNA is greater than one nucleotide. 10 . The method of claim 1 , wherein identifying the breakpoints for one of the plurality of test reads comprises: determining that a location of a sequence in the one of the plurality of test reads overlaps with a location of an inverted sequence in the one of the plurality of test reads; and determining that a location of the sequence in the target mitochondrial DNA does not overlap with a location of the inverted sequence in the target mitochondrial DNA. 11 . The method of claim 1 , further comprising validating the sequence variant against a database comprising known mitochondrial DNA variants. 12 . A system to identify variants in mitochondrial sequencing data, comprising one or more processors to: receive a plurality of sequence reads comprising an indication of sequenced DNA samples; identify a subset of the plurality of sequence reads, wherein each of the subsets of the plurality of sequence reads partially mapped to a target mitochondrial DNA sample; generate a plurality of query sequences based on each of the subsets of the plurality of sequence reads that are partially mapped to the target mitochondrial DNA sample; calculate a score for each of the subsets of the plurality of sequence reads based on an alignment of the plurality of query sequences with the target mitochondrial DNA sample; select a plurality of test reads, wherein the plurality of test reads comprises the subset of the plurality of sequence reads having a score below a predetermined threshold; identify a breakpoint for each of the plurality of test reads; determine a count of the plurality of test reads having the breakpoint at a predetermined location; and identify a sequence variant based on the count of the plurality of test reads having the breakpoint at the predetermined location. 13 . The system of claim 12 , wherein the DNA samples are pair-end sequenced. 14 . The system of claim 12 , wherein the sequence variant is one of a deletion, insertion, duplication, or inversion. 15 . The system of claim 12 , further comprising the one or more processors to generate a plurality of sequence words from the plurality of query sequences. 16 . The system of claim 15 , further comprising the one or more processors to align each of the plurality of sequence words from the plurality of query sequences to the target mitochondrial DNA sample. 17 . The system of claim 12 , wherein the score is an e-value indicating a probability of the alignment of each plurality of query sequences occurring by chance. 18 . The system of claim 15 , further comprising the one or more processors to: determine a distance between a first sequence in the one of the plurality of test reads and a second sequence in one of the plurality of test reads is one nucleotide; determine that a length of a deletion between a first location of the first sequence in the target mitochondrial DNA and a second location of the second sequence in the target mitochondrial DNA is greater than one nucleotide; and calculate the breakpoint for the one of the plurality of test reads based on the distance and the length of the deletion. 19 . The system of claim 15 , further comprising the one or more processors to: determine a distance between a first location of a first sequence in the target mitochondrial DNA and a second location of a second sequence in the target mitochondrial DNA is one nucleotide; determine a length of an insertion between the first sequence in the one of the plurality of test reads and the second sequence in the one of the plurality of test reads; and calculate the breakpoint for the one of the plurality of test reads based on the distance and the length of the insertion. 20 . The system of claim 15 , further comprising the one or more processors to: determine a distance between a first sequence in the one of the plurality of test reads and a second sequence in the one of the plurality of test reads is one nucleotide; and determine that a length of a duplication between an end location of the first sequence in the target mitochondrial DNA and a start location of the second sequence in the tar
Sequence alignment; Homology search · CPC title
Methods for sequencing · CPC title
Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.