Universal short adapters with variable length non-random unique molecular identifiers

US11447818B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11447818-B2
Application numberUS-201816129099-A
CountryUS
Kind codeB2
Filing dateSep 12, 2018
Priority dateSep 15, 2017
Publication dateSep 20, 2022
Grant dateSep 20, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments concern methods, systems and computer program products for determining sequences of interest using unique molecular indexes (UMIs) that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies or long sequence length. In some implementations, the UMIs include variable-length nonrandom UMIs (vNRUMIs). Methods and systems for making and using sequencing adapters comprising vNRUMIs are also provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A set of sequencing adapters comprising a plurality of double-stranded polynucleotides, wherein: each double-stranded polynucleotide comprises a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one variable-length, nonrandom unique molecular index (vNRUMI); variable-length, nonrandom unique molecular indices (vNRUMIs) of the set of sequencing adapters form a set of vNRUMIs configured to identify individual nucleic acid molecules in a sample for multiplex massively parallel sequencing; the set of vNRUMIs comprises sequences having two or more molecular lengths; and an edit distance between any two vNRUMIs of the set of vNRUMIs is not less than a first criterion value, wherein the first criterion value is at least two, wherein the set of vNRUMIs is provided by: (i) selecting an oligonucleotide sequence from a set of oligonucleotide sequences having two or more molecular lengths; (ii) adding the selected oligonucleotide to an expanding set of oligonucleotide sequences and removing the selected oligonucleotide from the set of oligonucleotide sequences to obtain a reduced set of oligonucleotide sequences; (iii) selecting an instant oligonucleotide sequence from the reduced set that maximizes a distance function, wherein the distance function is a minimal edit distance between an instant oligonucleotide sequence and any oligonucleotide sequences in the expanding set, and wherein the distance function meets the first criterion value; (iv) adding the instant oligonucleotide to the expanding set and removing the instant oligonucleotide from the reduced set; (v) repeating (iii) and (iv) one or more times; and (vi) providing the expanding set as the set of vNRUMIs. 2. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs comprises vNRUMIs of 6 nucleotides and vNRUMIs of 7 nucleotides. 3. The set of sequencing adapters of claim 1 , wherein the first criterion value is at least three. 4. The set of sequencing adapters of claim 1 , wherein the double-stranded hybridized region comprises a sequence of SEQ ID NO: 1 (AGATGTGTATAAGAGACAG). 5. The set of sequencing adapters of claim 1 , wherein the double-stranded hybridized region comprises a sequence of SEQ ID NO: 2 (CTGTCTCTTATACACATCT). 6. The set of sequencing adapters of claim 1 , wherein the single-stranded 5′ arm comprises a first primer binding sequence. 7. The set of sequencing adapters of claim 6 , wherein the first primer binding sequence is a sequence of SEQ ID NO: 3 (TCGTCGGCAGCGTC). 8. The set of sequencing adapters of claim 6 , wherein the single-stranded 5′ arm consists essentially of the first primer binding sequence. 9. The set of sequencing adapters of claim 1 , wherein the single-stranded 3′ arm comprises a second primer binding sequence. 10. The set of sequencing adapters of claim 9 , wherein the second primer binding sequence is a sequence of SEQ ID NO: 5 (CCGAGCCCACGAGAC). 11. The set of sequencing adapters of claim 9 , wherein the single-stranded 3′ arm consists essentially of the second primer binding sequence. 12. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on one strand of the double-stranded hybridized region and a reverse complement of the vNRUMI on another strand of the double-stranded hybridized region. 13. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on the single-stranded 5′ arm. 14. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on the single-stranded 3′ arm. 15. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs includes no more than about 1,000 different vNRUMIs. 16. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs includes no more than about 200 different vNRUMIs. 17. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs comprises: CACATGA, GGTTAC, TTGCCAG, AACCGC, ATGGTG, CTAGAAC, AGAATAG, TCAACTC, GTTCGGA, AAGACA, ACATTC, ACCAAG, CAGTAG, CCACCA, CTTGGC, GCCTGA, TGAGGA, TGTCCG, TAGCGTA, AGTCGAC, GTACACG, CCTATTG, TCGGAGA, GCTGTCA, TCCTTGC, GTGAGTC, TAATGCG, AGGCTCA, AACTAAC, GATGAAG, ATAACCA, TATGTTC, GGATTGA, GGCCATA, AACGTA, AATGAG, ACAGCG, ACGCAC, ACTAGA, AGAAGC, AGACTG, AGTGCA, ATTACG, CAACAC, CAGGTC, CATTGA, CCGATA, CCTAAC, CCTGTG, CGAACG, CGCAGA, CGCTTC, CTCCAG, GAAGTG, GACAAC, GAGCTA, GCACAG, GCGTTG, GGCATG, GTAACA, GTATGC, GTCCTC, GTGGAC, GTTGTA, TACCTG, TACTCA, TCAATG, TCACGC, TCGGCA, TGATAG, TGCCAC, TGTGTC, TCAGAAG, TTGTGAC, GATAGGC, TGAGCTG, ACGTTAC, TTGAACA, TATGGCA, TGTATAC, CACCTAC, ACGAGCA, GCGAATG, GCATACA, TCCTACG, TGTCATG, AGTGGTA, CGGTAAG, CCATAGC, CTTCCTG, GTTAGCG, CTCGATG, TTCGAGC, AAGTCCA, CTAAGGA, ATAAGTG, CTTGAGA, CCTCATA, TGCACCA, AGAGACG, GAACCTC, ATTGTCG, GAACGAG, ATAGCAG, CTAGTTA, TCGTGTG, AGGATTC, GTGCAAC, TACATAG, CTACTGC, GCAGTTC, TAGACGC, TTACCGA, CGGTGTA, CAATTAG, ACCGTTG, AAGGATG, GAGTCAG, ATGTAGC, and ATTCACA. 18. The set of sequencing adapters of claim 1 , wherein the edit distance is Levenshtein distance. 19. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having three or more consecutive identical bases. 20. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a combined number of guanine and cytosine bases smaller than 2 and sequences having a combined number of guanine and cytosine bases larger than 4. 21. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a same base at the last two positions. 22. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a thymine base at the last position. 23. A set of sequencing adapters comprising a plurality of double-stranded polynucleotides, wherein: each double-stranded polynucleotide comprises at least one variable-length, nonrandom unique molecular index (vNRUMI); variable-length, nonrandom unique molecular indices (vNRUMIs) of the set of sequencing adapters form a set of vNRUMIs configured to identify individual nucleic acid molecules in a sample for multiplex massively parallel sequencing; the set of vNRUMIs comprises sequences having two or more molecular lengths; and an edit distance between any two vNRUMIs of the set of vNRUMIs is not less than a first criterion value, wherein the first criterion value is at least two, wherein the set of vNRUMIs is provided by: (i) selecting an oligonucleotide sequence from a set of oligonucleotide sequences having two or more molecular lengths; (ii) adding the selected oligonucleotide to an expanding set of oligonucleotide sequences and removing the selected oligonucleotide from the set of oligonucleotide sequences to obtain a reduced set of oligonucleotide sequences; (iii) selecting an instant oligonucleotide sequence from the reduced set that maximizes a distance function, wherein the distance function is a minimal edit distance between an instant oligonucleotide sequence and any oligonucleotide sequences in the expanding set, and wherein the distance function meets the first criterion value; (iv) adding the instant oligonucleotide to the expanding set and removing the instant oligonucleotide from the reduced set; (v) repeating (iii) and (iv) one or more times; and (vi) providing the expanding set as the set of vNRUMIs. 24. The s

Assignees

Inventors

Classifications

  • ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • Polymerase chain reaction [PCR] · CPC title

  • Sequence alignment; Homology search · CPC title

  • C12Q1/6855Primary

    Ligating adaptors · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11447818B2 cover?
The disclosed embodiments concern methods, systems and computer program products for determining sequences of interest using unique molecular indexes (UMIs) that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies or long sequence length. In some implementations, the UMIs include variable-length nonrandom UMIs (vNRUMIs). Methods and …
Who is the assignee on this patent?
Illumina Inc
What technology area does this patent fall under?
Primary CPC classification C12Q1/6855. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Sep 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).