Portable and low-error DNA-based data storage

US10370246B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10370246-B1
Application numberUS-201715789519-A
CountryUS
Kind codeB1
Filing dateOct 20, 2017
Priority dateOct 20, 2016
Publication dateAug 6, 2019
Grant dateAug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides DNA-based storage system demonstrated through experimental and theoretical verification that such a platform can easily be implemented in practice using portable, nanopore-based sequencers. The gist of the approach is to design an integrated pipeline that encodes data to avoid synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable nanopore sequencing via new anchored iterative alignment and insertion/deletion error-correcting codes. The embodiments herein represent the only known random access DNA-based data storage system that uses error-prone portable, nanopore-based sequencers and produces low-error readouts with the highest reported information rate and density.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for enabling portable readout of synthetic nucleotide sequences, the method comprising: reading, from a nanopore-based storage device, a plurality of nucleotide sequence blocks, wherein each of the nucleotide sequence blocks as stored in the nanopore-based storage device contains respective address sequences followed by data sequences, wherein each of the data sequences as stored is identical to that of a target nucleotide data sequence, wherein each of the data sequences contains a series of fixed-length substrings and each of the fixed-length substrings is 50% guanine and cytosine and 50% adenine and thymine, and wherein the reading introduces deletion, insertion, or substitution errors into the nucleotide sequence blocks; selecting, by a computing device, a first group of the nucleotide sequence blocks read from the nanopore-based storage device, each having address sequences without any deletion, insertion, or substitution errors; aligning, by the computing device, the data sequences from the first group of nucleotide sequence blocks with one another; and performing, by the computing device, a first consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks, wherein the first consensus procedure produces a first output nucleotide data sequence. 2. The method of claim 1 , wherein the first output nucleotide data sequence matches the target nucleotide data sequence. 3. The method of claim 1 , further comprising: determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine; in response to determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine, selecting a second group of the nucleotide sequence blocks read from the nanopore-based storage device, each having address sequences with exactly one deletion, insertion, or substitution error; aligning the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks with one another; and performing a second consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks, wherein the second consensus procedure produces a second output nucleotide data sequence. 4. The method of claim 3 , wherein the second output nucleotide data sequence matches the target nucleotide data sequence. 5. The method of claim 1 , wherein each of the fixed-length substrings consists of 8 nucleotides. 6. The method of claim 1 , wherein each of the fixed-length substrings contains run length values for runs of one or more consecutive nucleotides therein. 7. The method of claim 6 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the fixed-length substrings of the data sequences based on inconsistences between a number of consecutive nucleotides and an associated run length value. 8. The method of claim 1 , wherein the consensus procedure determines the first output nucleotide data sequence from the data sequences based on a per-nucleotide majority-rule protocol that operates such that the fixed-length substrings of the first output nucleotide data sequence have 50% guanine and cytosine and 50% adenine and thymine. 9. The method of claim 1 , wherein each of the address sequences is 8-32 nucleotides in length, and wherein each of the data sequences is 512-2048 nucleotides in length. 10. The method of claim 1 , wherein each of the address sequences is p nucleotides in length, and wherein a particular address sequence of the address sequences does not appear as a non-address substring in any of the nucleotide sequence blocks. 11. The method of claim 1 , wherein each of the address sequences is p nucleotides in length, and wherein each of the address sequences is a Hamming distance of at least p/2 from one another. 12. The method of claim 1 , wherein each of the address sequences is 50% guanine and cytosine and 50% adenine and thymine. 13. A system comprising: a nanopore-based storage device storing a plurality of nucleotide sequence blocks, wherein each of the nucleotide sequence blocks contains respective address sequences followed by data sequences, wherein each of the data sequences as stored is identical to that of a target nucleotide data sequence, wherein each of the data sequences contains a series of fixed-length substrings and each of the fixed-length substrings is 50% guanine and cytosine and 50% adenine and thymine, and wherein reading from the nanopore-based storage device introduces deletion, insertion, or substitution errors into the nucleotide sequence blocks; and a computing device including a memory storing program instructions that, upon execution by a processor, cause the computing device to perform operations comprising: obtaining the plurality of nucleotide sequence blocks read from the nanopore-based storage device; selecting a first group of the nucleotide sequence blocks, each having address sequences without any deletion, insertion, or substitution errors; aligning the data sequences from the first group of nucleotide sequence blocks with one another; and performing a first consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks, wherein the first consensus procedure produces a first output nucleotide data sequence. 14. The system of claim 13 , wherein the first output nucleotide data sequence matches the target nucleotide data sequence. 15. The system of claim 13 , the operations further comprising: determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine; in response to determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine, selecting a second group of the nucleotide sequence blocks, each having address sequences with exactly one deletion, insertion, or substitution error; aligning the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks with one another; and performing a second consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks, wherein the second consensus procedure produces a second output nucleotide data sequence. 16. The system of claim 15 , wherein the second output nucleotide data sequence matches the target nucleotide data sequence. 17. The system of claim 13 , wherein each of the fixed-length substrings contains run length values for runs of one or more consecutive nucleotides therein. 18. The system of claim 17 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the fixed-length substrings of the data sequences based on inconsistences between a number of consecutive nucleotides and an associated run length value. 19. The system of claim 13 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the data sequences based on a per-nucleoti

Assignees

Inventors

Classifications

  • Nanobiotechnology or nanomedicine, e.g. protein engineering or drug delivery · CPC title

  • DNA computing · CPC title

  • Sequence alignment; Homology search · CPC title

  • Compression of genetic data · CPC title

  • ICT programming tools or database systems specially adapted for bioinformatics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10370246B1 cover?
The present disclosure provides DNA-based storage system demonstrated through experimental and theoretical verification that such a platform can easily be implemented in practice using portable, nanopore-based sequencers. The gist of the approach is to design an integrated pipeline that encodes data to avoid synthesis and sequencing errors, enables random access through addressing, and leverage…
Who is the assignee on this patent?
Univ Illinois
What technology area does this patent fall under?
Primary CPC classification G11C13/0019. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).