Lossless compression of DNA sequences
US-10902937-B2 · Jan 26, 2021 · US
US11093547B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11093547-B2 |
| Application number | US-201815929022-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 19, 2018 |
| Priority date | Jun 19, 2018 |
| Publication date | Aug 17, 2021 |
| Grant date | Aug 17, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Devices, methods, and systems for encoding data as DNA are provided. An encoder device can include an encoder engine configured to encode a data file having a bit sequence encoding data and further configured to generate a virtual DNA (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file, divide the VDNA sequence into a plurality of VDNA fragments, associate each VDNA fragment with an archive library sequence (Arc_SEQ), and generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ. A codeword sequence (Code_SEQ) is additionally generated for each VDNA fragment comprising a codename identifying the associated Arc_SEQ, the READ sequence associated with the VDNA fragment, and an index sequence (Idx_SEQ) including an index mapping of the VDNA fragment in the VDNA sequence.
Opening claim text (preview).
What is claimed is: 1. An encoder device, comprising: an encoder engine configured to receive instructions to encode a data file having a bit sequence of binary bits encoding data, the encoder engine further configured to: generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a codeword sequence (Code_SEQ) for each VDNA fragment comprising: a codename identifying the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) including an index mapping of the VDNA fragment in the VDNA sequence. 2. The device of claim 1 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, the encoder engine is further configured to: divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, and wherein the Idx_SEQ further comprises a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments generated by that fragmentation level, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and wherein the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 3. The device of claim 2 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 4. The device of claim 1 , wherein the READ sequence includes instructions selected from the group consisting of read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, sequence orientation, strand selection, and combinations thereof. 5. The device of claim 1 , wherein the Code_SEQ further comprises a data file reference identifying the data file. 6. The device of claim 5 , wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code SEQ to the data file. 7. The device of claim 6 , wherein the PCR primer site is specific for all of the plurality of VDNA fragments of the VDNA sequence of the data file. 8. The device of claim 1 , wherein the Code_SEQ is a physical DNA sequence. 9. The device of claim 1 , wherein each Vnb in the VDNA sequence consecutively encodes a bit-pair value of each successive pair of binary bits of the data file according to the bit sequence. 10. The device of claim 9 , wherein each Vnb is one of four Vnb-types including virtual adenine (VA), virtual cytosine (VC), virtual guanine (VG) and virtual thymine (VT), and wherein each of the four Vnb-types uniquely encodes for one of binary bit-pair values 00, 01, 10, or 11. 11. The device of claim 1 , wherein to generate the VDNA sequence of Vnbs, the encoder engine is further configured to: partition the bit sequence of the data file into a plurality of byte-units; divide each of the plurality of byte-units into a plurality of single bit digits and a plurality of double bit digits according to a common pattern across the bit sequence; assign a specific Vnb-type to each double bit digit based on a corresponding value of each double bit digit; and assign a specific Vnb-type from a limited selection of available Vnb-types to each single bit digit based on a corresponding value of each single bit digit and limited by a Vnb-type assigned to an immediately preceding single bit digit. 12. The device of claim 11 , wherein the common pattern of single bit digits and double bit digits generate a VG to VC content of about 50% and allows a homopolymer of no more than 2 of the same Vnb in the VDNA sequence. 13. The device of claim 1 , wherein the encoder engine includes a member selected from the group consisting of a processor, a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), and combinations thereof. 14. A data archival system, comprising: an encoder engine configured to receive a data file having a bit sequence of binary bits encoding data, the encoder engine further configured to: generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a physical DNA codeword sequence (Code_SEQ) for each VDNA fragment comprising: a codename identifying the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) including an index mapping of the VDNA fragment in the VDNA sequence; a deoxyribonucleic acid (DNA) synthesizer interface configured to communicatively couple to a DNA synthesizer; and a DNA synthesizer controller communicatively coupled to the DNA synthesizer interface and to the encoder engine, and configured to send instructions to the DNA synthesizer to generate the Code_SEQ as a DNA sequence. 15. The system of claim 14 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, the encoder engine is further configured to divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, and wherein the idx_SEQ further comprises a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments generated by that fragmentation level, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and wherein the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 16. The system of claim 15 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 17. The system of claim 14 , wherein the READ sequence includes instructions selected from the group consisting of read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, sequence orientation, strand selection, and combinations thereof. 18. The system of claim 14 , wherein the Code_SEQ further comprises a data file reference identifying the data file, wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code SEQ to the data file. 19. The system of claim 18 , wherein the PCR primer site is specific
Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title
Compression of genetic data · CPC title
employing the use of a dictionary, e.g. LZ78 · CPC title
Details of file format conversion · CPC title
Details of conversion of file system types or formats · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.