What technology area does this patent fall under?

Primary CPC classification G16B30/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths

US11761035B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11761035-B2
Application number	US-202017073074-A
Country	US
Kind code	B2
Filing date	Oct 16, 2020
Priority date	Jan 18, 2017
Publication date	Sep 19, 2023
Grant date	Sep 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the unique molecular index sequences include variable-length nonrandom sequences. In some implementations, the unique molecular index sequences are associated with the individual polynucleotide fragments based on alignment scores indicating similarity between the unique molecular index sequences and subsequences of sequence reads obtained from the individual polynucleotide fragments. System, apparatus, and computer program products are also provided for determining a sequence of interest implementing the methods disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for sequencing nucleic acid molecules from a sample, comprising (a) applying adapters to DNA fragments in the sample to obtain DNA-adapter products, wherein each adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indices of the adapters have at least two different molecular lengths and form a set of variable-length, nonrandom unique molecular indices (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) amplifying the DNA-adapter products to obtain a plurality of amplified polynucleotides; (c) sequencing the plurality of amplified polynucleotides, thereby obtaining a plurality of reads associated with the set of vNRUMIs; (d) identifying, among the plurality of reads, reads associated with a same variable-length, nonrandom unique molecular index (vNRUMI); and (e) determining a sequence of a DNA fragment in the sample using the reads associated with the same vNRUMI. 2. The method of claim 1 , wherein identifying the reads associated with the same vNRUMI comprises obtaining, for each read of the plurality of reads, alignment scores with respect to the set of vNRUMIs, each alignment score indicating similarity between a subsequence of a read and a vNRUMI, wherein the subsequence is in a region of the read in which nucleotides derived from the vNRUMI are likely located. 3. The method of claim 2 , wherein the alignment scores are based on matches of nucleotides and edits of nucleotides between the subsequence of the read and the vNRUMI. 4. The method of claim 3 , wherein the edits of nucleotides comprise substitutions, additions, and deletions of nucleotides. 5. The method of claim 3 , wherein each alignment score penalizes mismatches at the beginning of a sequence but does not penalize mismatches at the end of the sequence. 6. The method of claim 5 , wherein obtaining an alignment score between a read and a vNRUMI comprises: (a) calculating an alignment score between the vNRUMI and each one of all possible prefix sequences of the subsequence of the read; (b) calculating an alignment score between the subsequence of the read and each one of all possible prefix sequences of the vNRUMI; and (c) obtaining a largest alignment score among the alignment scores calculated in (a) and (b) as the alignment score between the read and the vNRUMI. 7. The method of claim 2 , wherein the subsequence has a length that equals to a length of the longest vNRUMI in the set of vNRUMIs. 8. The method of claim 2 , wherein identifying the reads associated with the same vNRUMI in (d) further comprises: selecting, for each read of the plurality of reads, at least one vNRUMI from the set of vNRUMIs based on the alignment scores; and associating each read of the plurality of reads with the at least one vNRUMI selected for the read. 9. The method of claim 8 , wherein selecting the at least one vNRUMI from the set of vNRUMIs comprises selecting a vNRUMI having a highest alignment score among the set of vNRUMIs. 10. The method of claim 8 , wherein the at least one vNRUMI comprises two or more vNRUMIs. 11. The method of claim 10 , further comprises selecting one of the two or more vNRUMI as the same vNRUMI of (d) and (e). 12. The method of claim 1 , wherein the adapters applied in (a) are obtained by: (i) providing a set of oligonucleotide sequences having at least two different molecular lengths; (ii) selecting a subset of oligonucleotide sequences from the set of oligonucleotide sequences, all edit distances between oligonucleotide sequences of the subset of oligonucleotide sequences meeting the threshold value, the subset of oligonucleotide sequences forming the set of vNRUMIs; and (iii) synthesizing the adapters each comprising a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one vNRUMI of the set of vNRUMIs. 13. The method of claim 1 , wherein the threshold value is 3. 14. The method of claim 1 , wherein the set of vNRUMIs comprise vNRUMIs of 6 nucleotides and vNRUMIs of 7 nucleotides. 15. The method of claim 1 , wherein (e) comprises collapsing reads associated with the same vNRUMI into a group to obtain a consensus nucleotide sequence for the sequence of the DNA fragment in the sample. 16. The method of claim 15 , the consensus nucleotide sequence is obtained based partly on quality scores of the reads. 17. The method of claim 1 , wherein (e) comprises: identifying, among the reads associated with the same vNRUMI, reads having a same read position or similar read positions in a reference sequence, and determining the sequence of the DNA fragment using reads that (i) are associated with the same vNRUMI and (ii) have the same read position or similar read positions in the reference sequence. 18. The method of claim 1 , wherein the set of vNRUMIs includes no more than about 10,000 different vNRUMIs. 19. A computer program product comprising a non-transitory machine readable medium storing program code that, when executed by one or more processors of a computer system, causes the computer system to implement a method for sequencing nucleic acid molecules from a sample, said program code comprising: (a) code for obtaining a plurality of reads of a plurality of amplified polynucleotides, each polynucleotide of the plurality of amplified polynucleotides comprising an adapter attached to a DNA fragment, wherein the adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indexes of the adapters have at least two different molecular lengths, forming a set of variable-length, nonrandom unique molecular indexes (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) code for identifying, among the plurality of reads, reads associated with a same vNRUMIs; and (c) code for determining, using the reads associated with the same vNRUMI, a sequence of a DNA fragment in the sample. 20. A computer system, comprising: one or more processors; system memory; and one or more computer-readable storage media having stored thereon computer-executable instructions that causes the computer system to implement a method for determine sequence information of a sequence of interest in a sample, the instructions comprising: (a) obtaining a plurality of reads of a plurality of amplified polynucleotides, each polynucleotide of the plurality of amplified polynucleotides comprising an adapter attached to a DNA fragment, wherein the adapter comprises a nonrandom unique molecular index, wherein nonrandom unique molecular indexes of the adapters have at least two different molecular lengths, forming a set of variable-length, nonrandom unique molecular indexes (vNRUMIs), and wherein an edit distance between each vNRUMI in the set of vNRUMIs is at least a threshold value, wherein the edit distances are based on edits of nucleotides comprising substitutions, additions, and deletions; (b) identifying, among the plurality of reads, reads associated with a same vNRUMIs; and (c) determining, using the reads associated with the same vNRUMI, a sequence of a DNA fragment in the sample. 21. The computer program product of claim 19 , wh

Assignees

Illumina Inc

Inventors

Classifications

C12Q2525/204
specific length of the oligonucleotides · CPC title
C12Q2525/191
incorporating an adaptor · CPC title
C12Q2535/122
Massive parallel sequencing · CPC title
G16B25/20
Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation · CPC title
G16B30/10Primary
Sequence alignment; Homology search · CPC title

Patent family

Related publications grouped by family.

View patent family 61054549

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11761035B2 cover?: The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the unique molecular index sequences include variable-leng…
Who is the assignee on this patent?: Illumina Inc
What technology area does this patent fall under?: Primary CPC classification G16B30/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).