Methods of storing information using nucleic acids
US-9996778-B2 · Jun 12, 2018 · US
US10370246B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10370246-B1 |
| Application number | US-201715789519-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 20, 2017 |
| Priority date | Oct 20, 2016 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides DNA-based storage system demonstrated through experimental and theoretical verification that such a platform can easily be implemented in practice using portable, nanopore-based sequencers. The gist of the approach is to design an integrated pipeline that encodes data to avoid synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable nanopore sequencing via new anchored iterative alignment and insertion/deletion error-correcting codes. The embodiments herein represent the only known random access DNA-based data storage system that uses error-prone portable, nanopore-based sequencers and produces low-error readouts with the highest reported information rate and density.
Opening claim text (preview).
The invention claimed is: 1. A method for enabling portable readout of synthetic nucleotide sequences, the method comprising: reading, from a nanopore-based storage device, a plurality of nucleotide sequence blocks, wherein each of the nucleotide sequence blocks as stored in the nanopore-based storage device contains respective address sequences followed by data sequences, wherein each of the data sequences as stored is identical to that of a target nucleotide data sequence, wherein each of the data sequences contains a series of fixed-length substrings and each of the fixed-length substrings is 50% guanine and cytosine and 50% adenine and thymine, and wherein the reading introduces deletion, insertion, or substitution errors into the nucleotide sequence blocks; selecting, by a computing device, a first group of the nucleotide sequence blocks read from the nanopore-based storage device, each having address sequences without any deletion, insertion, or substitution errors; aligning, by the computing device, the data sequences from the first group of nucleotide sequence blocks with one another; and performing, by the computing device, a first consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks, wherein the first consensus procedure produces a first output nucleotide data sequence. 2. The method of claim 1 , wherein the first output nucleotide data sequence matches the target nucleotide data sequence. 3. The method of claim 1 , further comprising: determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine; in response to determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine, selecting a second group of the nucleotide sequence blocks read from the nanopore-based storage device, each having address sequences with exactly one deletion, insertion, or substitution error; aligning the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks with one another; and performing a second consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks, wherein the second consensus procedure produces a second output nucleotide data sequence. 4. The method of claim 3 , wherein the second output nucleotide data sequence matches the target nucleotide data sequence. 5. The method of claim 1 , wherein each of the fixed-length substrings consists of 8 nucleotides. 6. The method of claim 1 , wherein each of the fixed-length substrings contains run length values for runs of one or more consecutive nucleotides therein. 7. The method of claim 6 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the fixed-length substrings of the data sequences based on inconsistences between a number of consecutive nucleotides and an associated run length value. 8. The method of claim 1 , wherein the consensus procedure determines the first output nucleotide data sequence from the data sequences based on a per-nucleotide majority-rule protocol that operates such that the fixed-length substrings of the first output nucleotide data sequence have 50% guanine and cytosine and 50% adenine and thymine. 9. The method of claim 1 , wherein each of the address sequences is 8-32 nucleotides in length, and wherein each of the data sequences is 512-2048 nucleotides in length. 10. The method of claim 1 , wherein each of the address sequences is p nucleotides in length, and wherein a particular address sequence of the address sequences does not appear as a non-address substring in any of the nucleotide sequence blocks. 11. The method of claim 1 , wherein each of the address sequences is p nucleotides in length, and wherein each of the address sequences is a Hamming distance of at least p/2 from one another. 12. The method of claim 1 , wherein each of the address sequences is 50% guanine and cytosine and 50% adenine and thymine. 13. A system comprising: a nanopore-based storage device storing a plurality of nucleotide sequence blocks, wherein each of the nucleotide sequence blocks contains respective address sequences followed by data sequences, wherein each of the data sequences as stored is identical to that of a target nucleotide data sequence, wherein each of the data sequences contains a series of fixed-length substrings and each of the fixed-length substrings is 50% guanine and cytosine and 50% adenine and thymine, and wherein reading from the nanopore-based storage device introduces deletion, insertion, or substitution errors into the nucleotide sequence blocks; and a computing device including a memory storing program instructions that, upon execution by a processor, cause the computing device to perform operations comprising: obtaining the plurality of nucleotide sequence blocks read from the nanopore-based storage device; selecting a first group of the nucleotide sequence blocks, each having address sequences without any deletion, insertion, or substitution errors; aligning the data sequences from the first group of nucleotide sequence blocks with one another; and performing a first consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks, wherein the first consensus procedure produces a first output nucleotide data sequence. 14. The system of claim 13 , wherein the first output nucleotide data sequence matches the target nucleotide data sequence. 15. The system of claim 13 , the operations further comprising: determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine; in response to determining that at least one fixed-length substring of the first output nucleotide data sequence is not 50% guanine and cytosine and 50% adenine and thymine, selecting a second group of the nucleotide sequence blocks, each having address sequences with exactly one deletion, insertion, or substitution error; aligning the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks with one another; and performing a second consensus procedure over respective aligned nucleotides of the data sequences from the first group of nucleotide sequence blocks and the data sequences from the second group of nucleotide sequence blocks, wherein the second consensus procedure produces a second output nucleotide data sequence. 16. The system of claim 15 , wherein the second output nucleotide data sequence matches the target nucleotide data sequence. 17. The system of claim 13 , wherein each of the fixed-length substrings contains run length values for runs of one or more consecutive nucleotides therein. 18. The system of claim 17 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the fixed-length substrings of the data sequences based on inconsistences between a number of consecutive nucleotides and an associated run length value. 19. The system of claim 13 , wherein the consensus procedure determines deletion, insertion, or substitution errors in the data sequences based on a per-nucleoti
Nanobiotechnology or nanomedicine, e.g. protein engineering or drug delivery · CPC title
DNA computing · CPC title
Sequence alignment; Homology search · CPC title
Compression of genetic data · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.