Methods of lowering the error rate of massively parallel dna sequencing using duplex consensus sequencing
US-2018363053-A1 · Dec 20, 2018 · US
US11761035B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11761035-B2 |
| Application number | US-202017073074-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 16, 2020 |
| Priority date | Jan 18, 2017 |
| Publication date | Sep 19, 2023 |
| Grant date | Sep 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the unique molecular index sequences include variable-length nonrandom sequences. In some implementations, the unique molecular index sequences are associated with the individual polynucleotide fragments based on alignment scores indicating similarity between the unique molecular index sequences and subsequences of sequence reads obtained from the individual polynucleotide fragments. System, apparatus, and computer program products are also provided for determining a sequence of interest implementing the methods disclosed.
Opening claim text (preview).
What is claimed is: 1. A method for sequencing nucleic acid molecules from a sample, comprising (a) applying adapters to DNA fragments in the sample to obtain DNA-adapter products, wherein each adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indices of the adapters have at least two different molecular lengths and form a set of variable-length, nonrandom unique molecular indices (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) amplifying the DNA-adapter products to obtain a plurality of amplified polynucleotides; (c) sequencing the plurality of amplified polynucleotides, thereby obtaining a plurality of reads associated with the set of vNRUMIs; (d) identifying, among the plurality of reads, reads associated with a same variable-length, nonrandom unique molecular index (vNRUMI); and (e) determining a sequence of a DNA fragment in the sample using the reads associated with the same vNRUMI. 2. The method of claim 1 , wherein identifying the reads associated with the same vNRUMI comprises obtaining, for each read of the plurality of reads, alignment scores with respect to the set of vNRUMIs, each alignment score indicating similarity between a subsequence of a read and a vNRUMI, wherein the subsequence is in a region of the read in which nucleotides derived from the vNRUMI are likely located. 3. The method of claim 2 , wherein the alignment scores are based on matches of nucleotides and edits of nucleotides between the subsequence of the read and the vNRUMI. 4. The method of claim 3 , wherein the edits of nucleotides comprise substitutions, additions, and deletions of nucleotides. 5. The method of claim 3 , wherein each alignment score penalizes mismatches at the beginning of a sequence but does not penalize mismatches at the end of the sequence. 6. The method of claim 5 , wherein obtaining an alignment score between a read and a vNRUMI comprises: (a) calculating an alignment score between the vNRUMI and each one of all possible prefix sequences of the subsequence of the read; (b) calculating an alignment score between the subsequence of the read and each one of all possible prefix sequences of the vNRUMI; and (c) obtaining a largest alignment score among the alignment scores calculated in (a) and (b) as the alignment score between the read and the vNRUMI. 7. The method of claim 2 , wherein the subsequence has a length that equals to a length of the longest vNRUMI in the set of vNRUMIs. 8. The method of claim 2 , wherein identifying the reads associated with the same vNRUMI in (d) further comprises: selecting, for each read of the plurality of reads, at least one vNRUMI from the set of vNRUMIs based on the alignment scores; and associating each read of the plurality of reads with the at least one vNRUMI selected for the read. 9. The method of claim 8 , wherein selecting the at least one vNRUMI from the set of vNRUMIs comprises selecting a vNRUMI having a highest alignment score among the set of vNRUMIs. 10. The method of claim 8 , wherein the at least one vNRUMI comprises two or more vNRUMIs. 11. The method of claim 10 , further comprises selecting one of the two or more vNRUMI as the same vNRUMI of (d) and (e). 12. The method of claim 1 , wherein the adapters applied in (a) are obtained by: (i) providing a set of oligonucleotide sequences having at least two different molecular lengths; (ii) selecting a subset of oligonucleotide sequences from the set of oligonucleotide sequences, all edit distances between oligonucleotide sequences of the subset of oligonucleotide sequences meeting the threshold value, the subset of oligonucleotide sequences forming the set of vNRUMIs; and (iii) synthesizing the adapters each comprising a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one vNRUMI of the set of vNRUMIs. 13. The method of claim 1 , wherein the threshold value is 3. 14. The method of claim 1 , wherein the set of vNRUMIs comprise vNRUMIs of 6 nucleotides and vNRUMIs of 7 nucleotides. 15. The method of claim 1 , wherein (e) comprises collapsing reads associated with the same vNRUMI into a group to obtain a consensus nucleotide sequence for the sequence of the DNA fragment in the sample. 16. The method of claim 15 , the consensus nucleotide sequence is obtained based partly on quality scores of the reads. 17. The method of claim 1 , wherein (e) comprises: identifying, among the reads associated with the same vNRUMI, reads having a same read position or similar read positions in a reference sequence, and determining the sequence of the DNA fragment using reads that (i) are associated with the same vNRUMI and (ii) have the same read position or similar read positions in the reference sequence. 18. The method of claim 1 , wherein the set of vNRUMIs includes no more than about 10,000 different vNRUMIs. 19. A computer program product comprising a non-transitory machine readable medium storing program code that, when executed by one or more processors of a computer system, causes the computer system to implement a method for sequencing nucleic acid molecules from a sample, said program code comprising: (a) code for obtaining a plurality of reads of a plurality of amplified polynucleotides, each polynucleotide of the plurality of amplified polynucleotides comprising an adapter attached to a DNA fragment, wherein the adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indexes of the adapters have at least two different molecular lengths, forming a set of variable-length, nonrandom unique molecular indexes (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) code for identifying, among the plurality of reads, reads associated with a same vNRUMIs; and (c) code for determining, using the reads associated with the same vNRUMI, a sequence of a DNA fragment in the sample. 20. A computer system, comprising: one or more processors; system memory; and one or more computer-readable storage media having stored thereon computer-executable instructions that causes the computer system to implement a method for determine sequence information of a sequence of interest in a sample, the instructions comprising: (a) obtaining a plurality of reads of a plurality of amplified polynucleotides, each polynucleotide of the plurality of amplified polynucleotides comprising an adapter attached to a DNA fragment, wherein the adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indexes of the adapters have at least two different molecular lengths, forming a set of variable-length, nonrandom unique molecular indexes (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) identifying, among the plurality of reads, reads associated with a same vNRUMIs; and (c) determining, using the reads associated with the same vNRUMI, a sequence of a DNA fragment in the sample. 21. The computer program product of claim 19 , wh
specific length of the oligonucleotides · CPC title
incorporating an adaptor · CPC title
Massive parallel sequencing · CPC title
Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation · CPC title
Sequence alignment; Homology search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.