Error suppression in sequenced DNA fragments using redundant reads with unique molecular indices (UMIS)

US10844428B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10844428-B2
Application numberUS-201615130668-A
CountryUS
Kind codeB2
Filing dateApr 15, 2016
Priority dateApr 28, 2015
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index (UMI) sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the UMIs include both physical UMIs and virtual UMIs. In some implementations, the unique molecular index sequences include non-random sequences. System, apparatus, and computer program products are also provided for determining a sequence of interest implementing the methods disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for sequencing nucleic acid molecules from a sample using unique molecular indices (UMIs), wherein each unique molecular index (UMI) is an oligonucleotide sequence that can be used to identify an individual molecule of a double-stranded DNA fragment in the sample, comprising (a) applying adapters to both ends of a plurality of double-stranded DNA fragments in the sample to obtain DNA-adapter products, wherein each adapter comprises a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and a physical UMI on one strand or each strand of the adapter, the physical UMI being selected from a plurality of physical UMIs, each double-stranded DNA fragment in the sample comprises a virtual UMI on one strand or each strand of the double-stranded DNA fragment, the virtual UMI is a sequence of nucleotides shorter than the double-stranded DNA fragment, the position of the virtual UMI is defined at or with respect to an end of the double-stranded DNA fragment, and the plurality of double-stranded DNA fragments is not obtained by restriction endonuclease digestion; (b) amplifying both strands of the DNA-adapter products to obtain a plurality of amplified polynucleotides; (c) sequencing, using a nucleic acid sequencer, the plurality of amplified polynucleotides, thereby obtaining a plurality of reads each comprising a physical UMI corresponding to a physical UMI on an adapter and a virtual UMI corresponding to a virtual UMI on a double-stranded DNA fragment in the sample; (d) identifying a plurality of physical UMI sequences for the plurality of reads; (e) identifying a plurality of virtual UMI sequences for the plurality of reads; and (f) determining sequences of the plurality of double-stranded DNA fragments in the sample by: (i) grouping the plurality of reads based at least on the plurality of virtual UMI sequences to obtain a plurality of groups of reads, (ii) determining a plurality of consensus nucleotide sequences using the plurality of groups of reads, and (iii) determining the sequences of the plurality of double-stranded DNA fragments using the plurality of consensus nucleotide sequences. 2. The method of claim 1 , wherein (f)(i) comprises: grouping the plurality of reads based at least on the plurality of virtual UMI sequences and the plurality of physical UMI sequences in the reads to obtain the plurality of groups of reads, each group having a unique combination of a virtual UMI sequence and a physical UMI sequence. 3. The method of claim 1 , wherein the plurality of physical UMIs comprises random UMIs. 4. The method of claim 1 , wherein the plurality of physical UMIs comprises nonrandom UMIs. 5. The method of claim 4 , wherein every nonrandom UMI differs from every other nonrandom UMI of the adapters by at least two nucleotides at corresponding sequence positions of the nonrandom UMIs. 6. The method of claim 5 , wherein the plurality of physical UMIs includes no more than about 10,000 unique nonrandom UMIs. 7. The method of claim 6 , wherein the plurality of physical UMIs includes no more than about 1,000 unique nonrandom UMIs. 8. The method of claim 7 , wherein the plurality of physical UMIs includes no more than about 500 unique nonrandom UMIs. 9. The method of claim 8 , wherein the plurality of physical UMIs includes no more than about 100 unique nonrandom UMIs. 10. The method of claim 9 , wherein the plurality of physical UMIs includes about 96 unique nonrandom UMIs. 11. The method of claim 1 , wherein applying adapters to both ends of double-stranded DNA fragments comprises ligating the adapters to both ends of the double-stranded DNA fragments. 12. The method of claim 1 , wherein the plurality of physical UMIs includes fewer than 12 nucleotides. 13. The method of claim 12 , wherein the plurality of physical UMIs includes no more than 6 nucleotides. 14. The method of claim 12 , wherein the plurality of physical UMIs includes no more than 4 nucleotides. 15. The method of claim 1 , wherein the adapters each comprise a physical UMI on each strand of the adapters in the double-stranded hybridized region. 16. The method of claim 15 , wherein the physical UMI is at or near an end of the double-stranded hybridized region, said end of the double-stranded hybridized region being opposite from the 3′ arm or the 5′ arm. 17. The method of claim 16 , wherein the physical UMI is at said end of the double-stranded hybridized region, or is one nucleotide away from said end of the double-stranded hybridized region. 18. The method of claim 17 , wherein the adapters each comprise a 5′-TGG-3′ trinucleotide or a 3′-ACC-5′ trinucleotide on the double-stranded hybridized region adjacent to a physical UMI. 19. The method of claim 18 , wherein the adapters each comprise a read primer sequence on each strand of the double-stranded hybridized region. 20. The method of claim 1 , wherein the adapters each comprise a physical UMI on only one strand of the adapters on the single-stranded 5′ arm or the single-stranded 3′ arm. 21. The method of claim 20 , wherein (f) comprises: (i) collapsing reads having a same first physical UMI sequence into a first group to obtain a first consensus nucleotide sequence; (ii) collapsing reads having a same second physical UMI sequence into a second group to obtain a second consensus nucleotide sequence; and (iii) determining, using the first and second consensus nucleotide sequences, a sequence of one of the double-stranded DNA fragments in the sample. 22. The method of claim 21 , wherein (iii) comprises: (1) obtaining, using localization information and sequence information of the first and second consensus nucleotide sequences, a third consensus nucleotide sequence, and (2) determining, using the third consensus nucleotide sequence, the sequence of one of the double-stranded DNA fragments. 23. The method of claim 20 , wherein (e) comprises identifying the plurality of virtual UMI sequences, while the adapters each comprise the physical UMI on only the single-stranded 5′ arm or the single-stranded 3′ arm. 24. The method of claim 23 , wherein (f) comprises: (i) combining reads having a first physical UMI sequence and at least one virtual UMI sequence in a read direction and reads having a second physical UMI sequence and the at least one virtual UMI sequence in the read direction to determine a consensus nucleotide sequence; and (ii) determining a sequence of one of the double-stranded DNA fragments in the sample using the consensus nucleotide sequence. 25. The method of claim 1 , wherein the adapters each comprise a physical UMI on each strand of the adapters in a double-stranded region of the adapters, wherein the physical UMI on one strand is complementary to the physical UMI on the other strand. 26. The method of claim 25 , wherein (f) comprises: (i) combining reads having a first physical UMI sequence, at least one virtual UMI sequence, and a second physical UMI sequence in the 5′ to 3′ direction and reads having the second physical UMI sequence, the at least one virtual UMI sequence, and the first physical UMI sequence in the 5′ to 3′ direction to determine a consensus nucleotide sequence; and (ii) determining a sequence of one of the double-stranded DNA fragments in the sample using the consensus nucleotide sequence. 27. The method of claim 1 , wherein the adapters eac

Assignees

Inventors

Classifications

  • Preparation or screening of tagged libraries, e.g. tagged microorganisms by STM-mutagenesis, tagged polynucleotides, gene tags · CPC title

  • Sequence alignment; Homology search · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • C12Q1/6869Primary

    Methods for sequencing · CPC title

  • Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay (C12Q1/6804 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10844428B2 cover?
The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index (UMI) sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the UMIs include both physical UMIs and virtual UMIs…
Who is the assignee on this patent?
Illumina Inc
What technology area does this patent fall under?
Primary CPC classification C12N15/1065. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).