Methods of lowering the error rate of massively parallel DNA sequencing using duplex consensus sequencing
US-9752188-B2 · Sep 5, 2017 · US
US11549144B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11549144-B2 |
| Application number | US-202117392185-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 2, 2021 |
| Priority date | Mar 20, 2012 |
| Publication date | Jan 10, 2023 |
| Grant date | Jan 10, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Next Generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of approximately 1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when “deep sequencing” genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, a method Duplex Consensus Sequencing (DCS) is provided. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors will result in errors in only one strand. This method uniquely capitalizes on the redundant information stored in double-stranded DNA, thus overcoming technical limitations of prior methods utilizing data from only one of the two strands.
Opening claim text (preview).
What is claimed is: 1. A method for detecting genomic variants in nucleic acid material from a subject, comprising: (a) providing fragmented nucleic acid material obtained from a bodily sample of the subject; (b) attaching tags comprising barcodes selected from a plurality of distinct barcode sequences to said nucleic acid fragments obtained from said bodily sample of the subject, to generate tagged nucleic acid molecules, wherein each tagged nucleic acid molecule is identifiable with respect to other tagged nucleic acid molecules from the bodily sample; (c) amplifying at least a portion of the tagged nucleic acid molecules to produce tagged nucleic acid amplicons; (d) sequencing a plurality of tagged nucleic acid amplicons to produce a plurality of sequence reads from the tagged nucleic acid molecules, wherein each sequence read comprises a barcode sequence and a sequence derived from a nucleic acid fragment; (e) aligning sequence reads from the tagged nucleic acid molecules to a reference sequence; (f) grouping sequence reads that align to the reference sequence at the same coordinates and which have the same barcode sequence into families, whereby each family comprises sequence reads of tagged nucleic acid molecules amplified from an original tagged nucleic acid molecule; and (g) within one or more families: distinguishing between sequence reads derived from a first strand of the original tagged nucleic acid molecule and sequence reads derived from a second strand of the same original tagged nucleic acid molecule; comparing a first strand sequence read with a second strand sequence read to identify nucleic acid base pairs that are in agreement; and comparing said nucleic acid base pairs to the reference sequence to identify genomic variants. 2. The method of claim 1 , wherein within one or more families the method further comprises: collapsing a plurality of sequence reads derived from a first strand to provide a the first strand sequence read; and collapsing a plurality of sequence reads derived from second strand to provide the second strand sequence read. 3. The method of claim 1 , wherein nucleic acid base pairs that are not in agreement between the first and second strand sequence reads are considered artifacts. 4. The method of claim 1 , wherein comparing a first strand sequence read with a second strand sequence read comprises generating a consensus sequence of the nucleic acid fragment from the bodily sample, and wherein each of the consensus sequences corresponds to a unique nucleic acid fragment among the nucleic acid fragments. 5. The method of claim 4 , wherein the reference sequence comprises one or more loci, and wherein the method further comprises: (h) identifying consensus sequences that map to a given locus of said one or more loci; and (i) calculating a number of consensus sequences that map to the given locus that include a cancer-associated genomic variant thereby quantifying variant cancer biomarkers in said nucleic acid fragments from said subject. 6. The method of claim 5 , further comprising calculating a ratio of (1) a number of consensus sequences that map to the given locus that include the cancer-associated genomic variant to (2) a total number of consensus sequences that map to the given locus, thereby quantifying genomic variant cancer biomarkers in said plurality of nucleic acid fragments from said subject. 7. The method of claim 4 , further comprising determining copy number variation by mapping the consensus sequences to the reference sequence and counting consensus sequences corresponding to the reference sequence. 8. The method of claim 4 , wherein an error rate of the consensus sequence is lower than about 1×10 −6 or is as low as about 1.2×10 −9 . 9. The method of claim 4 , further comprising assessing a tumor cell diversity within a tumor cell population and/or detecting a genetic variant conferring a cancer drug resistance by mapping the consensus sequences to the reference sequence and identifying particular genomic variant cancer biomarkers within the identified consensus sequences. 10. The method of claim 1 , wherein the plurality of nucleic acid fragments includes a genetic variant having a variant frequency lower than about 1% or lower than about 0.01%. 11. The method of claim 1 , wherein the plurality of nucleic acid fragments includes a genetic variant having a variant frequency as low as about 0.01% or as low as about 0.03%. 12. The method of claim 1 , further comprising filtering out sequence reads or individual nucleotide base calls that fail to meet a set quality threshold. 13. The method of claim 1 , further comprising detecting a cancer in the subject at an early stage by detecting a cancer-associated nucleic acid-based serum biomarker in the subject. 14. The method of claim 1 , further comprising monitoring a cancer's response to a therapy by detecting a cancer-associated nucleic acid-based serum biomarker in the subject. 15. The method of claim 1 , wherein the bodily sample comprises a blood sample, and wherein the fragmented nucleic acid material is obtained from plasma or serum. 16. The method of claim 1 , wherein the attaching comprises blunt-end ligation or sticky end ligation. 17. The method of claim 1 , wherein the nucleic acid material from the subject comprises fragments of a desired size. 18. The method of claim 1 , wherein step (a) further comprises shearing the nucleic acid material to generate the nucleic acid fragments. 19. The method of claim 1 , wherein step (a) further comprises enzymatically cutting the nucleic acid material to generate the nucleic acid fragments. 20. The method of claim 1 , wherein the distinct barcode sequences are contained within a library generated from oligonucleotides comprising defined sequences. 21. The method of claim 1 , wherein the plurality of distinct barcode sequences comprises random degenerate or semi-degenerate barcode sequences. 22. The method of claim 1 , wherein the plurality of distinct barcode sequences comprises nonrandom barcode sequences. 23. The method of claim 1 , wherein the nucleic acid fragments are tagged with a double-stranded DNA barcode. 24. The method of claim 1 , wherein the nucleic acid fragments are tagged with a single-stranded nucleic acid barcode. 25. The method of claim 1 , selectively enriching at least one of tagged nucleic acid molecules and tagged nucleic acid amplicons fora subset of tagged molecules that map to one or more genetic loci in the reference sequence. 26. The method of claim 1 , wherein the grouping is based on i) the barcode sequence and ii) at least one of: sequence information at a beginning of the sequence derived from the nucleic acid fragment and sequence information at an end of the sequence derived from the nucleic acid fragment. 27. The method of claim 1 , wherein said tagged nucleic acid molecule can be differentiated from other tagged nucleic acid molecules using a combination of at least a first barcode sequence at a first end of said nucleic acid fragment and a second barcode sequence at a second end of said nucleic acid fragment. 28. The method of claim 1 , wherein the reference sequence is from a non-cancerous control sample. 29. The method of claim 1 , wherein the reference sequence comprises a sequence from a human reference genome. 30. The
Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes · CPC title
Methods for sequencing · CPC title
Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay (C12Q1/6804 takes precedence) · CPC title
incorporating an adaptor · CPC title
characterised by the use of the arrayed oligonucleotides as identifier tags, e.g. universal addressable array, anti-tag or tag complement array · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.