Methods and systems for generation and error-correction of unique molecular index sets with heterogeneous molecular lengths
US-2021079462-A1 · Mar 18, 2021 · US
US11447818B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11447818-B2 |
| Application number | US-201816129099-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 12, 2018 |
| Priority date | Sep 15, 2017 |
| Publication date | Sep 20, 2022 |
| Grant date | Sep 20, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed embodiments concern methods, systems and computer program products for determining sequences of interest using unique molecular indexes (UMIs) that are uniquely associable with individual polynucleotide fragments, including sequences with low allele frequencies or long sequence length. In some implementations, the UMIs include variable-length nonrandom UMIs (vNRUMIs). Methods and systems for making and using sequencing adapters comprising vNRUMIs are also provided.
Opening claim text (preview).
What is claimed is: 1. A set of sequencing adapters comprising a plurality of double-stranded polynucleotides, wherein: each double-stranded polynucleotide comprises a double-stranded hybridized region, a single-stranded 5′ arm, a single-stranded 3′ arm, and at least one variable-length, nonrandom unique molecular index (vNRUMI); variable-length, nonrandom unique molecular indices (vNRUMIs) of the set of sequencing adapters form a set of vNRUMIs configured to identify individual nucleic acid molecules in a sample for multiplex massively parallel sequencing; the set of vNRUMIs comprises sequences having two or more molecular lengths; and an edit distance between any two vNRUMIs of the set of vNRUMIs is not less than a first criterion value, wherein the first criterion value is at least two, wherein the set of vNRUMIs is provided by: (i) selecting an oligonucleotide sequence from a set of oligonucleotide sequences having two or more molecular lengths; (ii) adding the selected oligonucleotide to an expanding set of oligonucleotide sequences and removing the selected oligonucleotide from the set of oligonucleotide sequences to obtain a reduced set of oligonucleotide sequences; (iii) selecting an instant oligonucleotide sequence from the reduced set that maximizes a distance function, wherein the distance function is a minimal edit distance between an instant oligonucleotide sequence and any oligonucleotide sequences in the expanding set, and wherein the distance function meets the first criterion value; (iv) adding the instant oligonucleotide to the expanding set and removing the instant oligonucleotide from the reduced set; (v) repeating (iii) and (iv) one or more times; and (vi) providing the expanding set as the set of vNRUMIs. 2. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs comprises vNRUMIs of 6 nucleotides and vNRUMIs of 7 nucleotides. 3. The set of sequencing adapters of claim 1 , wherein the first criterion value is at least three. 4. The set of sequencing adapters of claim 1 , wherein the double-stranded hybridized region comprises a sequence of SEQ ID NO: 1 (AGATGTGTATAAGAGACAG). 5. The set of sequencing adapters of claim 1 , wherein the double-stranded hybridized region comprises a sequence of SEQ ID NO: 2 (CTGTCTCTTATACACATCT). 6. The set of sequencing adapters of claim 1 , wherein the single-stranded 5′ arm comprises a first primer binding sequence. 7. The set of sequencing adapters of claim 6 , wherein the first primer binding sequence is a sequence of SEQ ID NO: 3 (TCGTCGGCAGCGTC). 8. The set of sequencing adapters of claim 6 , wherein the single-stranded 5′ arm consists essentially of the first primer binding sequence. 9. The set of sequencing adapters of claim 1 , wherein the single-stranded 3′ arm comprises a second primer binding sequence. 10. The set of sequencing adapters of claim 9 , wherein the second primer binding sequence is a sequence of SEQ ID NO: 5 (CCGAGCCCACGAGAC). 11. The set of sequencing adapters of claim 9 , wherein the single-stranded 3′ arm consists essentially of the second primer binding sequence. 12. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on one strand of the double-stranded hybridized region and a reverse complement of the vNRUMI on another strand of the double-stranded hybridized region. 13. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on the single-stranded 5′ arm. 14. The set of sequencing adapters of claim 1 , wherein the double-stranded polynucleotide comprises a vNRUMI on the single-stranded 3′ arm. 15. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs includes no more than about 1,000 different vNRUMIs. 16. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs includes no more than about 200 different vNRUMIs. 17. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs comprises: CACATGA, GGTTAC, TTGCCAG, AACCGC, ATGGTG, CTAGAAC, AGAATAG, TCAACTC, GTTCGGA, AAGACA, ACATTC, ACCAAG, CAGTAG, CCACCA, CTTGGC, GCCTGA, TGAGGA, TGTCCG, TAGCGTA, AGTCGAC, GTACACG, CCTATTG, TCGGAGA, GCTGTCA, TCCTTGC, GTGAGTC, TAATGCG, AGGCTCA, AACTAAC, GATGAAG, ATAACCA, TATGTTC, GGATTGA, GGCCATA, AACGTA, AATGAG, ACAGCG, ACGCAC, ACTAGA, AGAAGC, AGACTG, AGTGCA, ATTACG, CAACAC, CAGGTC, CATTGA, CCGATA, CCTAAC, CCTGTG, CGAACG, CGCAGA, CGCTTC, CTCCAG, GAAGTG, GACAAC, GAGCTA, GCACAG, GCGTTG, GGCATG, GTAACA, GTATGC, GTCCTC, GTGGAC, GTTGTA, TACCTG, TACTCA, TCAATG, TCACGC, TCGGCA, TGATAG, TGCCAC, TGTGTC, TCAGAAG, TTGTGAC, GATAGGC, TGAGCTG, ACGTTAC, TTGAACA, TATGGCA, TGTATAC, CACCTAC, ACGAGCA, GCGAATG, GCATACA, TCCTACG, TGTCATG, AGTGGTA, CGGTAAG, CCATAGC, CTTCCTG, GTTAGCG, CTCGATG, TTCGAGC, AAGTCCA, CTAAGGA, ATAAGTG, CTTGAGA, CCTCATA, TGCACCA, AGAGACG, GAACCTC, ATTGTCG, GAACGAG, ATAGCAG, CTAGTTA, TCGTGTG, AGGATTC, GTGCAAC, TACATAG, CTACTGC, GCAGTTC, TAGACGC, TTACCGA, CGGTGTA, CAATTAG, ACCGTTG, AAGGATG, GAGTCAG, ATGTAGC, and ATTCACA. 18. The set of sequencing adapters of claim 1 , wherein the edit distance is Levenshtein distance. 19. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having three or more consecutive identical bases. 20. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a combined number of guanine and cytosine bases smaller than 2 and sequences having a combined number of guanine and cytosine bases larger than 4. 21. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a same base at the last two positions. 22. The set of sequencing adapters of claim 1 , wherein the set of vNRUMIs excludes sequences having a thymine base at the last position. 23. A set of sequencing adapters comprising a plurality of double-stranded polynucleotides, wherein: each double-stranded polynucleotide comprises at least one variable-length, nonrandom unique molecular index (vNRUMI); variable-length, nonrandom unique molecular indices (vNRUMIs) of the set of sequencing adapters form a set of vNRUMIs configured to identify individual nucleic acid molecules in a sample for multiplex massively parallel sequencing; the set of vNRUMIs comprises sequences having two or more molecular lengths; and an edit distance between any two vNRUMIs of the set of vNRUMIs is not less than a first criterion value, wherein the first criterion value is at least two, wherein the set of vNRUMIs is provided by: (i) selecting an oligonucleotide sequence from a set of oligonucleotide sequences having two or more molecular lengths; (ii) adding the selected oligonucleotide to an expanding set of oligonucleotide sequences and removing the selected oligonucleotide from the set of oligonucleotide sequences to obtain a reduced set of oligonucleotide sequences; (iii) selecting an instant oligonucleotide sequence from the reduced set that maximizes a distance function, wherein the distance function is a minimal edit distance between an instant oligonucleotide sequence and any oligonucleotide sequences in the expanding set, and wherein the distance function meets the first criterion value; (iv) adding the instant oligonucleotide to the expanding set and removing the instant oligonucleotide from the reduced set; (v) repeating (iii) and (iv) one or more times; and (vi) providing the expanding set as the set of vNRUMIs. 24. The s
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
Polymerase chain reaction [PCR] · CPC title
Sequence alignment; Homology search · CPC title
Ligating adaptors · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.