Efficient assembly of oligonucleotides for nucleic acid based data storage
US-10956806-B2 · Mar 23, 2021 · US
US11630863B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11630863-B2 |
| Application number | US-202117403791-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 16, 2021 |
| Priority date | Jun 19, 2018 |
| Publication date | Apr 18, 2023 |
| Grant date | Apr 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Devices, methods, and systems for encoding data as DNA are provided. An encoder device can include circuitry to encode a data file having a bit sequence encoding data and to generate a virtual DNA (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file, divide the VDNA sequence into a plurality of VDNA fragments, associate each VDNA fragment with an archive library sequence (Arc_SEQ), and generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ. A codeword sequence (Code_SEQ) is additionally generated for each VDNA fragment that includes a codename identifying the associated Arc_SEQ, the READ sequence associated with the VDNA fragment, and an index sequence (Idx_SEQ) including an index mapping of the VDNA fragment in the VDNA sequence.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: circuitry to: receive instructions to encode a data file having a bit sequence of binary bits encoding data; generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a codeword sequence (Code_SEQ) for each VDNA fragment that includes: a codename to identify the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) that includes an index mapping of the VDNA fragment in the VDNA sequence. 2. The apparatus of claim 1 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, further comprises the circuitry to: divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, the Idx_SEQ to also include a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 3. The apparatus of claim 2 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 4. The apparatus of claim 1 , wherein the READ sequence includes a read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, a sequence orientation, or a strand selection. 5. The apparatus of claim 1 , wherein the Code_SEQ further comprises a data file reference identifying the data file. 6. The apparatus of claim 5 , wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code_SEQ to the data file. 7. The apparatus of claim 6 , wherein the PCR primer site is specific for all of the plurality of VDNA fragments of the VDNA sequence of the data file. 8. The apparatus of claim 1 , wherein the Code_SEQ is a physical DNA sequence. 9. The apparatus of claim 1 , wherein each Vnb in the VDNA sequence consecutively encodes a bit-pair value of each successive pair of binary bits of the data file according to the bit sequence. 10. The apparatus of claim 9 , wherein each Vnb is one of four Vnb-types including virtual adenine (VA), virtual cytosine (VC), virtual guanine (VG) and virtual thymine (VT), and wherein each of the four Vnb-types uniquely encodes for one of binary bit-pair values 00, 01, 10, or 11. 11. The apparatus of claim 1 , wherein to generate the VDNA sequence of Vnbs, further comprises the circuitry to: partition the bit sequence of the data file into a plurality of byte-units; divide each of the plurality of byte-units into a plurality of single bit digits and a plurality of double bit digits according to a common pattern across the bit sequence; assign a specific Vnb-type to each double bit digit based on a corresponding value of each double bit digit; and assign a specific Vnb-type from a limited selection of available Vnb-types to each single bit digit based on a corresponding value of each single bit digit and limited by a Vnb-type assigned to an immediately preceding single bit digit. 12. The apparatus of claim 11 , wherein the common pattern of single bit digits and double bit digits generate a VG to VC content of about 50% and allows a homopolymer of no more than 2 of the same Vnb in the VDNA sequence. 13. The apparatus of claim 1 , the circuitry comprising a processor, a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). 14. A system, comprising: encoder circuitry to: receive a data file having a bit sequence of binary bits encoding data; generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a physical DNA codeword sequence (Code_SEQ) for each VDNA fragment that includes: a codename to identify the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) that includes an index mapping of the VDNA fragment in the VDNA sequence; a deoxyribonucleic acid (DNA) synthesizer interface configured to communicatively couple to a DNA synthesizer; and a DNA synthesizer controller communicatively coupled to the DNA synthesizer interface and to the encoder circuitry, the DNA synthesizer to send instructions to the DNA synthesizer to generate the Code_SEQ as a DNA sequence. 15. The system of claim 14 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, further comprises the encoder circuitry to: divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, the idx_SEQ to also include a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 16. The system of claim 15 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 17. The system of claim 14 , wherein the READ sequence includes a read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, a sequence orientation, or a strand selection. 18. The system of claim 14 , wherein the Code_SEQ further comprises a data file reference identifying the data file, wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code_SEQ to the data file. 19. The system of claim 18 , wherein the PCR primer site is specific for all of the plurality of VDNA fragments of the VDNA sequence of the data file. 20. The system of claim 14 , the encoder circuitry comprising a processor, a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). 21. A method comprising: generating a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that re
Compression of genetic data · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title
Details of conversion of file system types or formats · CPC title
using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.