Automatic identification, definition and management of data for DNA storage systems

US10963469B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10963469-B2
Application numberUS-201815876188-A
CountryUS
Kind codeB2
Filing dateJan 21, 2018
Priority dateJan 21, 2018
Publication dateMar 30, 2021
Grant dateMar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments include facilitating DNA storage of digital data including a plurality of data assets in a network by building a causal graph of the network and the relationship of the data assets; computing a value of each data asset; computing, using the causal graph and data values, a radius of recovery for each data asset; classifying each data asset as appropriate DNA stored by assigning a numerical ranking of each data asset; defining manual constraints and a DNA storage configuration; and generating a ranked list of recommended data assets for storing in the DNA storage using the classification, manual constraints and DNA storage configuration.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of facilitating DNA storage of digital data including a plurality of data assets in a network for a data backup and recovery operation, the method comprising: providing DNA storage to store the digital data in base sequences of artificial DNA, wherein the DNA storage is characterized by high reliability and slow retrieval times, and wherein the data assets comprise critical but very low access rate data; defining characteristics of each data set and relationships among the data assets; classifying, in a data classifier, the data assets as Apocalypse Day Data (ADD) suitable for storage in the DNA storage based on the characteristics comprising: access rate, data value, data volume, radius of recovery, and data rawness; computing, using the relationships and data values, a radius of recovery (ROR) for each data asset, wherein the ROR is a measure of additional assets that can be retrieved from a respective data asset in the data backup and recovery operation; assigning a numerical ranking to each data asset based on the respective characteristics and the ROR; defining manual constraints and a DNA storage configuration; and generating a ranked list of recommended data assets for storing in the DNA storage using the classification, manual constraints and DNA storage configuration; and storing at least some of the recommended data assets in the DNA storage for later retrieval using DNA sequencing and decoding methods. 2. The method of claim 1 wherein the defining relationships step comprises building a causal graph representing the network and the relationships among the data assets. 3. The method of claim 2 wherein causal graph is a directed acyclical graph, and the ROR is computed by a breadth-first search (BFS) traversal in a time linear manner in the size of the graph with a sink node (l) initialized at ROR(l)=0. 4. The method of claim 1 further comprising matching the recommended data assets against a set of defined policy constraints input to the data classifier. 5. The method of claim 4 further comprising; displaying the recommended assets to a user for selection of selected data assets as DNA stored data; and sending an alert in the event a recommended data asset violates a defined policy constraint. 6. The method of claim 1 wherein with respect to access rate, a low access rate characterizes ADD data; with respect to data value, a high value characterizes ADD data; with respect to data volume, a limited volume per protection policy configuration characterizes ADD data; and with respect to radius of recovery, a high ROR value characterizes ADD data. 7. The method of claim 6 further comprising: assigning a numeric grade to each characteristic of a graded data asset; combining the assigned numeric grades to derive a score for the graded data asset; and using the score to rank the graded data asset in the ranked list relative to other graded data assets. 8. The method of claim 7 wherein the ROR for a data asset indicates how many additional existing data assets can be at least partially retrieved from the data asset, and depends on the data rawness, which comprises a minimal set of resources where no resource subsumes another resource within a same resource group. 9. The method of claim 1 wherein data classified as appropriate for DNA storage is created in batches on a periodic basis. 10. The method of claim 1 further comprising: adding a dedicated storage region to a storage device having maximum storage protection for non-DNA eligible storage data; storing a present batch of appropriate DNA data to the dedicated storage region until a next batch of appropriate DNA data is processed; and transmitting the present batch of appropriate DNA data to a DNA storage pipeline for storage on DNA media after a pre-defined time period. 11. A system of facilitating DNA storage of digital data including a plurality of data assets in a network for a data backup and recovery operation, comprising: a DNA storage configured to store the digital data in base sequences of artificial DNA, wherein the DNA storage is characterized by high reliability and slow retrieval times, and wherein the data assets comprise critical but very low access rate data; a first component defining characteristics of each data set and relationships among the data assets; a data valuation component computing a value of each data asset; a computer computing, using the relationships and data values, a radius of recovery (ROR) for each data asset, wherein the ROR is a measure of additional assets that can be retrieved from a respective data asset in the data backup and recovery operation; a classifier classifying each data asset as Apocalypse Day Data (ADD) suitable for storage in the DNA storage based on the characteristics comprising: access rate, data value, data volume, and data rawness by assigning a numerical ranking to each data asset based on the respective characteristics and ROR; a policy module defining manual constraints and a DNA storage configuration; an output interface generating a ranked list of recommended data assets for storing in the DNA storage using the classification, manual constraints and DNA storage configuration; and the DNA storage storing at least some of the recommended data assets in the DNA storage for later retrieval using DNA sequencing and decoding methods. 12. The system of claim 11 wherein the first component comprises a causal graph builder building a causal graph representing the network and the relationships among the data assets. 13. The system of claim 12 wherein causal graph is a directed acyclical graph, and the ROR is computed by a breadth-first search (BFS) traversal in a time linear manner in the size of the graph with a sink node (l) initialized at ROR(l)=0. 14. The system of claim 13 further comprising an automated component matching the recommended data assets against a set of defined policy constraints provided by the policy module. 15. The system of claim 14 wherein the output interface further displays the recommended assets to a user for selection of selected data assets as DNA stored data, and sends an alert in the event a recommended data asset violates a defined policy constraint. 16. The system of claim 11 wherein with respect to access rate, a low access rate characterizes ADD data; with respect to data value, a high value characterizes ADD data; with respect to data volume, a limited volume per protection policy configuration characterizes ADD data; and with respect to radius of recovery, a high ROR value characterizes ADD data. 17. The system of claim 16 wherein the computer further assigns a numeric grade to each characteristic of a graded data asset, combines the assigned numeric grades to derive a score for the graded data asset, and uses the score to rank the graded data asset in the ranked list relative to other graded data assets. 18. The system of claim 17 wherein the ROR for a data asset indicates how many additional existing data assets can be at least partially retrieved from the data asset, and depends on the data rawness, which comprises a minimal set of resources where no resource subsumes another resource within a same resource group, and further wherein data classified as appropriate for DNA storage is created in batches on a periodic basis. 19. The system of claim 11 further comprising: dedicated storage region added to a storage device having maximum storage protection for non-DNA eligible storage data; a storing component storing

Assignees

Inventors

Classifications

  • User-Defined Types; Storage management thereof · CPC title

  • using ranking · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10963469B2 cover?
Embodiments include facilitating DNA storage of digital data including a plurality of data assets in a network by building a causal graph of the network and the relationship of the data assets; computing a value of each data asset; computing, using the causal graph and data values, a radius of recovery for each data asset; classifying each data asset as appropriate DNA stored by assigning a num…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/24578. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).