Data storage based on encoded DNA sequences

US11630863B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11630863-B2
Application numberUS-202117403791-A
CountryUS
Kind codeB2
Filing dateAug 16, 2021
Priority dateJun 19, 2018
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Devices, methods, and systems for encoding data as DNA are provided. An encoder device can include circuitry to encode a data file having a bit sequence encoding data and to generate a virtual DNA (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file, divide the VDNA sequence into a plurality of VDNA fragments, associate each VDNA fragment with an archive library sequence (Arc_SEQ), and generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ. A codeword sequence (Code_SEQ) is additionally generated for each VDNA fragment that includes a codename identifying the associated Arc_SEQ, the READ sequence associated with the VDNA fragment, and an index sequence (Idx_SEQ) including an index mapping of the VDNA fragment in the VDNA sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: circuitry to: receive instructions to encode a data file having a bit sequence of binary bits encoding data; generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a codeword sequence (Code_SEQ) for each VDNA fragment that includes: a codename to identify the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) that includes an index mapping of the VDNA fragment in the VDNA sequence. 2. The apparatus of claim 1 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, further comprises the circuitry to: divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, the Idx_SEQ to also include a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 3. The apparatus of claim 2 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 4. The apparatus of claim 1 , wherein the READ sequence includes a read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, a sequence orientation, or a strand selection. 5. The apparatus of claim 1 , wherein the Code_SEQ further comprises a data file reference identifying the data file. 6. The apparatus of claim 5 , wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code_SEQ to the data file. 7. The apparatus of claim 6 , wherein the PCR primer site is specific for all of the plurality of VDNA fragments of the VDNA sequence of the data file. 8. The apparatus of claim 1 , wherein the Code_SEQ is a physical DNA sequence. 9. The apparatus of claim 1 , wherein each Vnb in the VDNA sequence consecutively encodes a bit-pair value of each successive pair of binary bits of the data file according to the bit sequence. 10. The apparatus of claim 9 , wherein each Vnb is one of four Vnb-types including virtual adenine (VA), virtual cytosine (VC), virtual guanine (VG) and virtual thymine (VT), and wherein each of the four Vnb-types uniquely encodes for one of binary bit-pair values 00, 01, 10, or 11. 11. The apparatus of claim 1 , wherein to generate the VDNA sequence of Vnbs, further comprises the circuitry to: partition the bit sequence of the data file into a plurality of byte-units; divide each of the plurality of byte-units into a plurality of single bit digits and a plurality of double bit digits according to a common pattern across the bit sequence; assign a specific Vnb-type to each double bit digit based on a corresponding value of each double bit digit; and assign a specific Vnb-type from a limited selection of available Vnb-types to each single bit digit based on a corresponding value of each single bit digit and limited by a Vnb-type assigned to an immediately preceding single bit digit. 12. The apparatus of claim 11 , wherein the common pattern of single bit digits and double bit digits generate a VG to VC content of about 50% and allows a homopolymer of no more than 2 of the same Vnb in the VDNA sequence. 13. The apparatus of claim 1 , the circuitry comprising a processor, a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). 14. A system, comprising: encoder circuitry to: receive a data file having a bit sequence of binary bits encoding data; generate a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that reversibly encodes the bit sequence of the data file; divide the VDNA sequence into a plurality of VDNA fragments; associate each VDNA fragment with an archive library sequence (Arc_SEQ); generate a read instruction (READ) sequence of differences between each VDNA fragment and each associated Arc_SEQ including sufficient instruction to facilitate regeneration of each VDNA fragment from each associated Arc_SEQ; and generate a physical DNA codeword sequence (Code_SEQ) for each VDNA fragment that includes: a codename to identify the associated Arc_SEQ; the READ sequence associated with the VDNA fragment; and an index sequence (Idx_SEQ) that includes an index mapping of the VDNA fragment in the VDNA sequence; a deoxyribonucleic acid (DNA) synthesizer interface configured to communicatively couple to a DNA synthesizer; and a DNA synthesizer controller communicatively coupled to the DNA synthesizer interface and to the encoder circuitry, the DNA synthesizer to send instructions to the DNA synthesizer to generate the Code_SEQ as a DNA sequence. 15. The system of claim 14 , wherein, to divide the VDNA sequence into the plurality of VDNA fragments, further comprises the encoder circuitry to: divide the VDNA sequence into pluralities of successively smaller VDNA segments according to a hierarchical series of fragmentation levels to generate the plurality of VDNA fragments, the idx_SEQ to also include a series of fragmentation level indexes corresponding to the hierarchical series of fragmentation levels, each fragmentation level index including a pre-fragmentation position for each of the plurality of VDNA segments, wherein the plurality of VDNA fragments is generated at a final fragmentation level, and the series of fragmentation level indexes provide an original position in the VDNA sequence for each of the plurality of VDNA fragments. 16. The system of claim 15 , wherein the series of fragmentation level indexes include sufficient position information to reconstruct the VDNA sequence from the Idx_SEQs of the plurality of VDNA fragments. 17. The system of claim 14 , wherein the READ sequence includes a read direction, read start sites, read stop sites, insertion locations, deletion locations, substitution locations, a sequence orientation, or a strand selection. 18. The system of claim 14 , wherein the Code_SEQ further comprises a data file reference identifying the data file, wherein the data file reference further comprises a polymerase chain reaction (PCR) primer site associating the Code_SEQ to the data file. 19. The system of claim 18 , wherein the PCR primer site is specific for all of the plurality of VDNA fragments of the VDNA sequence of the data file. 20. The system of claim 14 , the encoder circuitry comprising a processor, a field-programmable gate array (FPGA), or an application-specific integrated circuit (ASIC). 21. A method comprising: generating a virtual deoxyribonucleic acid (VDNA) sequence of virtual nucleotide bases (Vnbs) that re

Assignees

Inventors

Classifications

  • Compression of genetic data · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • H03M7/30Primary

    Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • Details of conversion of file system types or formats · CPC title

  • using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11630863B2 cover?
Devices, methods, and systems for encoding data as DNA are provided. An encoder device can include circuitry to encode a data file having a bit sequence encoding data and to generate a virtual DNA (VDNA) sequence of virtual nucleotide bases (Vnb) that reversibly encodes the bit sequence of the data file, divide the VDNA sequence into a plurality of VDNA fragments, associate each VDNA fragment w…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification H03M7/30. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).