Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths

US2018201992A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018201992-A1
Application numberUS-201815863737-A
CountryUS
Kind codeA1
Filing dateJan 5, 2018
Priority dateJan 18, 2017
Publication dateJul 19, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the unique molecular index sequences include variable-length nonrandom sequences. In some implementations, the unique molecular index sequences are associated with the individual polynucleotide fragments based on alignment scores indicating similarity between the unique molecular index sequences and subsequences of sequence reads obtained from the individual polynucleotide fragments. System, apparatus, and computer program products are also provided for determining a sequence of interest implementing the methods disclosed.

First claim

Opening claim text (preview).

1 . A method for sequencing nucleic acid molecules from a sample, comprising (a) applying adapters to DNA fragments in the sample to obtain DNA-adapter products, wherein each adapter comprises a nonrandom unique molecular index, and wherein nonrandom unique molecular indices of the adapters have at least two different molecular lengths and form a set of variable-length, nonrandom unique molecular indices (vNRUMIs); (b) amplifying the DNA-adapter products to obtain a plurality of amplified polynucleotides; (c) sequencing the plurality of amplified polynucleotides, thereby obtaining a plurality of reads associated with the set of vNRUMIs; (d) identifying, among the plurality of reads, reads associated with a same variable-length, nonrandom unique molecular index (vNRUMI); and (e) determining a sequence of a DNA fragment in the sample using the reads associated with the same vNRUMI. 2 . The method of claim 1 , wherein identifying the reads associated with the same vNRUMI comprises obtaining, for each read of the plurality of reads, alignment scores with respect to the set of vNRUMIs, each alignment score indicating similarity between a subsequence of a read and a vNRUMI, wherein the subsequence is in a region of the read in which nucleotides derived from the vNRUMI are likely located. 3 . The method of claim 2 , wherein the alignment scores are based on matches of nucleotides and edits of nucleotides between the subsequence of the read and the vNRUMI. 4 . The method of claim 3 , wherein the edits of nucleotides comprise substitutions, additions, and deletions of nucleotides. 5 . The method of claim 3 , wherein each alignment score penalizes mismatches at the beginning of a sequence but does not penalize mismatches at the end of the sequence. 6 . The method of claim 5 , wherein obtaining an alignment score between a read and a vNRUMI comprises: (a) calculating an alignment score between the vNRUMI and each one of all possible prefix sequences of the subsequence of the read; (b) calculating an alignment score between the subsequence of the read and each one of all possible prefix sequences of the vNRUMI; and (c) obtaining a largest alignment score among the alignment scores calculated in (a) and (b) as the alignment score between the read and the vNRUMI. 7 . The method of claim 2 , wherein the subsequence has a length that equals to a length of the longest vNRUMI in the set of vNRUMIs. 8 . The method of claim 2 , wherein identifying the reads associated with the same vNRUMI in (d) further comprises: selecting, for each read of the plurality of reads, at least one vNRUMI from the set of vNRUMIs based on the alignment scores; and associating each read of the plurality of reads with the at least one vNRUMI selected for the read. 9 . The method of claim 8 , wherein selecting the at least one vNRUMI from the set of vNRUMIs comprises selecting a vNRUMI having a highest alignment score among the set of vNRUMIs. 10 . The method of claim 8 , wherein the at least one vNRUMI comprises two or more vNRUMIs. 11 . The method of claim 10 , further comprises selecting one of the two or more vNRUMI as the same vNRUMI of (d) and (e). 12 . The method of claim 1 , wherein the adapters applied in (a) are obtained by: (i) providing a set of oligonucleotide sequences having at least two different molecular lengths; (ii) selecting a subset of oligonucleotide sequences from the set of oligonucleotide sequences, all edit distances between oligonucleotide sequences of the subset of oligonucleotide sequences meeting a threshold value, the subset of oligonucleotide sequences forming the set of vNRUMIs; and (iii) synthesizing the adapters each comprising a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one vNRUMI of the set of vNRUMIs. 13 . The method of claim 12 , wherein the threshold value is 3. 14 . The method of claim 1 , wherein the set of vNRUMIs comprise vNRUMIs of 6 nucleotides and vNRUMIs of 7 nucleotides. 15 . The method of claim 1 , wherein (e) comprises collapsing reads associated with the same vNRUMI into a group to obtain a consensus nucleotide sequence for the sequence of the DNA fragment in the sample. 16 . The method of claim 15 , the consensus nucleotide sequence is obtained based partly on quality scores of the reads. 17 . The method of claim 1 , wherein (e) comprises: identifying, among the reads associated with the same vNRUMI, reads having a same read position or similar read positions in a reference sequence, and determining the sequence of the DNA fragment using reads that (i) are associated with the same vNRUMI and (ii) have the same read position or similar read positions in the reference sequence. 18 - 21 . (canceled) 22 . A method for preparing sequencing adapters, comprising: (a) providing a set of oligonucleotide sequences having at least two different molecular lengths; (b) selecting a subset of oligonucleotide sequences from the set of oligonucleotide sequences, all edit distances between oligonucleotide sequences of the subset of oligonucleotide sequences meeting a threshold value, the subset of oligonucleotide sequences forming a set of variable-length, nonrandom unique molecular indexes (vNRUMIs); and (c) synthesizing a plurality of sequencing adapters, wherein each sequencing adapter comprises a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one vNRUMI of the set of vNRUMIs. 23 - 37 . (canceled) 38 . A method for sequencing nucleic acid molecules from a sample, comprising (a) applying adapters to DNA fragments in the sample to obtain DNA-adapter products, wherein each adapter comprises a unique molecular index (UMI), and wherein unique molecular indices (UMIs) of the adapters have at least two different molecular lengths and form a set of variable-length unique molecular indices (vUMIs); (b) amplifying the DNA-adapter products to obtain a plurality of amplified polynucleotides; (c) sequencing the plurality of amplified polynucleotides, thereby obtaining a plurality of reads associated with the set of vUMIs; and (d) identifying, among the plurality of reads, reads associated with a same variable-length unique molecular index (vUMI). 39 . The method of claim 38 , further comprising determining a sequence of a DNA fragment in the sample using the reads associated with the same vUMI. 40 - 46 . (canceled)

Assignees

Inventors

Classifications

  • C12Q1/6869Primary

    Methods for sequencing · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • C12Q1/6855Primary

    Ligating adaptors · CPC title

  • specific length of the oligonucleotides · CPC title

  • incorporating an adaptor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018201992A1 cover?
The disclosed embodiments concern methods, apparatus, systems and computer program products for determining sequences of interest using unique molecular index sequences that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies and long sequence length. In some implementations, the unique molecular index sequences include variable-leng…
Who is the assignee on this patent?
Illumina Inc
What technology area does this patent fall under?
Primary CPC classification C12Q1/6869. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Thu Jul 19 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).