Methods of lowering the error rate of massively parallel DNA sequencing using duplex consensus sequencing

US12006545B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12006545-B2
Application numberUS-202117392180-A
CountryUS
Kind codeB2
Filing dateAug 2, 2021
Priority dateMar 20, 2012
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Next Generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of approximately 1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when “deep sequencing” genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, a method Duplex Consensus Sequencing (DCS) is provided. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors will result in errors in only one strand. This method uniquely capitalizes on the redundant information stored in double-stranded DNA, thus overcoming technical limitations of prior methods utilizing data from only one of the two strands.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for sequencing a double-stranded nucleic acid molecule, comprising: (a) preparing a sequencing library, wherein preparing a sequencing library comprises: providing a set of hairpin adapters having a double-stranded region and a linker region; ligating the hairpin adapters to a plurality of double-stranded nucleic acid molecules to generate a sequencing library comprising a plurality of adapter-nucleic acid molecule complexes; and transitioning the adapter-nucleic acid molecule complexes from a double-stranded form to a linear single-stranded form, wherein each linear single-stranded adapter-nucleic acid molecule complex comprises at a first strand of a double-stranded nucleic acid molecule and a second strand of the same double-stranded nucleic acid molecule, separated by an adapter sequence; (b) sequencing at least a portion of the linear single-stranded adapter-nucleic acid molecule complexes to obtain a plurality of sequence reads, wherein sequencing comprises cluster amplifying the portion of the linear single-stranded adapter-nucleic acid molecule complexes on a sequencing substrate; (c) grouping the plurality of sequence reads into a plurality of families based at least in part by cluster on the sequencing substrate; and (d) comparing sequence reads within a family to generate a consensus sequence for that family. 2. The method of claim 1 , wherein ligating the hairpin adapters to a plurality of double-stranded nucleic acid molecules comprises ligating hairpin adapters to both ends of the double-stranded nucleic acid molecules, and wherein at least one of the hairpin adapters on at least a portion of the adapter-nucleic acid molecule complexes comprises a cleavage site. 3. The method of claim 2 , wherein the cleavage site is an endonuclease target sequence, and wherein transitioning the adapter-nucleic acid molecule complexes from a double-stranded form to a linear single-stranded form comprises cleaving the endonuclease target sequence with an endonuclease. 4. The method of claim 1 , wherein preparing the sequencing library further comprises providing a set of adapters comprising a Y-shape, and wherein the ligating step further comprises ligating a mix of adapters comprising the Y-shape and the hairpin adapters to the plurality of double-stranded nucleic acid molecules to generate the sequencing library. 5. The method of claim 1 , wherein at least a portion of the adapter-nucleic acid molecule complexes comprises the nucleic acid molecules having an adapter comprising a Y-shape on a first end and a hairpin adapter on a second end. 6. The method of claim 1 , wherein one or more nucleotides in the adapter sequence is an RNA nucleotide, a uracil nucleotide, a modified nucleotide or a non-natural nucleotide. 7. The method of claim 6 , the RNA nucleotide, the uracil nucleotide, the modified nucleotide or the non-natural nucleotide provides a cleavage site, and wherein the method further comprises cleaving the adapter sequence at the cleavage site using a nucleotide-specific nuclease. 8. The method of claim 1 , further comprising amplifying the linear single-stranded adapter-nucleic acid molecule complexes prior to sequencing. 9. The method of claim 1 , further comprising modifying the plurality of double-stranded nucleic acid molecules by performing an end-repairing procedure. 10. The method of claim 9 , further comprising performing an A-tailing or T-tailing procedure prior to ligating the hairpin adapters to the double-stranded nucleic acid molecules. 11. The method of claim 9 , further comprising generating a ligatable end on the double-stranded nucleic acid molecules, wherein the ligatable end comprises a T-overhang, an A-overhang, a CG overhang, a blunt end or a single-stranded sequence complementary to an adapter ligation domain. 12. The method of claim 1 , wherein the hairpin adapters further comprise one or more amplification primer binding sites. 13. The method of claim 1 , wherein the hairpin adapters further comprise one or more sequencing primer binding sites. 14. The method of claim 1 , wherein the consensus sequence comprises a sequence of nucleotide bases, and wherein each nucleotide base is identified at a given position in the consensus sequence when a specific nucleotide is complementary between at least one sequence read of the first strand and at least one sequence read of the second strand. 15. The method of claim 14 , wherein generating a consensus sequence for each of the families further comprises identifying nucleotide positions where the compared sequence read of the first strand and the sequence read of the second strand are non-complementary and scoring the identified non-complementary nucleotide positions as potential artifacts. 16. The method of claim 1 , further comprising loading at least a portion of the sequencing library into a sequencing flow cell and generating a plurality of sequencing clusters on the flow cell, wherein each of the sequencing clusters comprises the first and second strands of an original double-stranded nucleic acid molecule. 17. The method of claim 1 , wherein at least a portion of the hairpin adapters comprise a single molecule identifier (SMI) sequence, and wherein step (c) comprises grouping the plurality of sequence reads into a plurality of families based at least in part on the SMI sequences. 18. The method of claim 1 , wherein prior to step (b), the method further comprises generating amplicons of the adapter-nucleic acid molecule complexes. 19. The method of claim 1 , wherein at least a portion of the hairpin adapters comprise a single molecule identifier (SMI) sequence, and wherein step (c) comprises grouping the plurality of sequence reads into a plurality of families based at least in part on the SMI sequences, and wherein one or more of the families comprise sequence reads from multiple clusters.

Assignees

Inventors

Classifications

  • C12Q1/6869Primary

    Methods for sequencing · CPC title

  • Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay (C12Q1/6804 takes precedence) · CPC title

  • C12Q1/6876Primary

    Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes · CPC title

  • incorporating arbitrary or random nucleotide sequences · CPC title

  • incorporating bases where the precise position of the bases in the nucleic acid string is important · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12006545B2 cover?
Next Generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA sequence in a single experiment, the error rate of approximately 1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become ext…
Who is the assignee on this patent?
Univ Washington Through Its Center For Commercialization
What technology area does this patent fall under?
Primary CPC classification C12Q1/6869. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).