Error detection in sequence tag directed subassemblies of short sequencing reads

US10577601B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10577601-B2
Application numberUS-201715594476-A
CountryUS
Kind codeB2
Filing dateMay 12, 2017
Priority dateSep 12, 2008
Publication dateMar 3, 2020
Grant dateMar 3, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention provides methods for preparing DNA sequencing libraries by assembling short read sequencing data into longer contiguous sequences for genome assembly, full length cDNA sequencing, metagenomics, and the analysis of repetitive sequences of assembled genomes.

First claim

Opening claim text (preview).

The embodiments of the invention in which an exclusive property or privilege is claimed are defined as follows: 1. A method for detecting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising: (a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first tag sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library and comprising a sequence identical to or complementary to the first tag sequence; (d) grouping the plurality of sequencing reads that correspond to the same at least one member of the target library based solely on the commonality of having the first tag sequence or a complement thereof to produce a plurality of grouped sequencing reads; and (e) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library. 2. The method of claim 1 , wherein the method further comprises determining the correct identity of a nucleotide at the position where the variation of nucleotide identity is detected, wherein the correct identity is determined based on a consensus of individual base calls in the plurality of grouped sequencing reads. 3. The method of claim 2 , wherein the consensus of individual base calls is the most common base call at the nucleotide position in the plurality of grouped sequencing reads. 4. The method of claim 1 , wherein the method further comprises eliminating from further analysis the identity of the nucleotide at the position in a sequencing read where an error is detected. 5. The method of claim 1 , wherein the method further comprises eliminating from further analysis a sequencing read determined to comprise a sequencing error. 6. The method of claim 5 , wherein the sequencing read is determined to comprise a sequencing error when it comprises a nucleotide base call that differs from the consensus nucleotide base call provided by the plurality of grouped sequencing reads. 7. The method of claim 1 , wherein the first tag sequence comprises a unique nucleotide sequence that distinguishes the at least one member of the target library from other members of the target library. 8. The method of claim 1 , further comprising fragmenting at least a portion of the first plurality of amplified DNA molecules in the input library from step (b) to produce a plurality of linear DNA fragments having a first end and a second end. 9. The method of claim 8 , further comprising attaching at least one second nucleic acid adaptor molecule to one or both ends of at least one of the plurality of linear DNA fragments, wherein the second adaptor molecule comprises a defined sequence. 10. The method of claim 9 , further comprising amplifying the plurality of linear DNA fragments to produce a second plurality of amplified DNA molecules, wherein at least one of the second plurality of amplified DNA molecules comprises a sequence identical to or complementary to the first tag sequence, a sequence identical to or complementary to at least a portion of the second adaptor molecule, and a sequence identical to or complementary to at least a portion of a member of the target library. 11. The method of claim 10 , wherein at least a portion of the second plurality of amplified DNA molecules is sequenced in step (c) of the method to produce a plurality of associated sequence reads for each sequenced DNA molecule corresponding to the at least one member of the target library. 12. The method of claim 11 , wherein the associated sequence reads comprise a first sequence read and a second sequence read, wherein the first sequence read comprises the first tag sequence of the first adaptor that uniquely identifies a single nucleic acid member of the target library, and wherein the second sequence read comprises a sequence adjacent to the defined sequence of the second adaptor and represents the sequence adjacent to a fragment breakpoint from the fragmented input library. 13. The method of claim 12 , wherein a plurality of second sequence reads that are each associated with a first sequence read are grouped in step (d) of the method, wherein the first sequence read contains the first tag sequence identifying a single nucleic acid member of the target library. 14. The method of claim 1 , wherein the grouping step (d) comprises generating an alignment of the plurality of sequencing reads. 15. The method of claim 1 , wherein the incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library in step (a) results in at least one circular nucleic acid molecule comprising the first nucleic acid adaptor and the member of a target library. 16. A method for correcting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising: (a) incorporating at least one first nucleic acid adaptor molecule into at least one member of a target library comprising a plurality of nucleic acid molecules, wherein the first adaptor molecule comprises a first tag sequence; (b) amplifying the plurality of nucleic acid molecules to produce an input library comprising a first plurality of amplified DNA molecules, wherein the amplified DNA molecules comprise a sequence identical to or complementary to the first tag sequence and a sequence identical to or complementary to at least a portion of the at least one member of the target library; (c) sequencing at least a portion of the plurality of amplified DNA molecules to produce a plurality of sequencing reads corresponding to the at least one member of the target library and comprising a sequence identical to or complementary to the first tag sequence; (d) grouping the plurality of sequencing reads that correspond to the at least one member of the target library based solely on the commonality of having the first tag sequence or a complement thereof to produce a plurality of grouped sequencing reads; (e) detecting whether an error exists at a nucleotide position, wherein an error exists when a variation of nucleotide identity exists among the plurality of grouped sequencing reads at a position corresponding to a nucleotide in the at least one member of the target library; and (f) determining a correct identity of the nucleotide at the position where the variation of nucleotide identity is detected, wherein the correct identity is determined based on a consensus of individual base calls in the plurality of grouped sequencing reads. 17. The method of claim 16 , wherein the consensus of individual base calls is the most common base call at the nucleotide position in the plurality of grouped sequencing reads. 18. A method of detecting an error occurring in the preparation and/or sequencing of a DNA sequencing library, the method comprising: (a) grouping a plurality of nucleic acid

Assignees

Inventors

Classifications

  • Biochemical methods, e.g. using enzymes or whole viable microorganisms · CPC title

  • General methods of preparing gene libraries, not provided for in other subgroups · CPC title

  • Methods for sequencing · CPC title

  • involving nucleic acid arrays, e.g. sequencing by hybridisation · CPC title

  • General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10577601B2 cover?
The invention provides methods for preparing DNA sequencing libraries by assembling short read sequencing data into longer contiguous sequences for genome assembly, full length cDNA sequencing, metagenomics, and the analysis of repetitive sequences of assembled genomes.
Who is the assignee on this patent?
Univ Washington
What technology area does this patent fall under?
Primary CPC classification C12N15/1065. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Mar 03 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).