Method and system for creation, analysis and navigation of virtual snapshots
US-9465518-B1 · Oct 11, 2016 · US
US10007445B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10007445-B2 |
| Application number | US-201514628041-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2015 |
| Priority date | Nov 4, 2014 |
| Publication date | Jun 26, 2018 |
| Grant date | Jun 26, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for managing, storing, and serving data within a virtualized environment are described. In some embodiments, a data management system may manage the extraction and storage of virtual machine snapshots, provide near instantaneous restoration of a virtual machine or one or more files located on the virtual machine, and enable secondary workloads to directly use the data management system as a primary storage target to read or modify past versions of data. The data management system may allow a virtual machine snapshot of a virtual machine stored within the system to be directly mounted to enable substantially instantaneous virtual machine recovery of the virtual machine.
Opening claim text (preview).
What is claimed is: 1. A method for operating a data management system, comprising: storing a first set of snapshots of a first virtual machine as a first set of files using a distributed file system, the distributed file system replicates the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine; storing a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files using the distributed file system, the distributed file system replicates the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine; determining a first job associated with the first virtual machine to be performed using a distributed job scheduler, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes; determining that a first node of the plurality of nodes stores the first set of files; and running the first job on the first node in response to determining that the first node stores the first set of files, the first job comprising: generating a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image; comparing the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the first virtual machine; identifying the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated; generating the dependent base file using the first base image for the first virtual machine and the second base image for the second virtual machine; and storing the dependent base file for the first virtual machine using the distributed file system. 2. The method of claim 1 , further comprising: determining that the first job has been completely executed subsequent to running the first job on the first node; and updating a state of the first job that is stored within a distributed metadata store in response to determining that the first job has been completely executed. 3. The method of claim 2 , wherein: the first job comprises a series of tasks that are to be performed atomically, the determining that the first job has been completely executed includes detecting that each of the series of tasks has been performed without a failure being detected. 4. The method of claim 2 , wherein: the distributed metadata store comprises a distributed database, the distributed database replicates the state of the first job among at least a subset of the plurality of nodes. 5. The method of claim 1 , further comprising: determining that the first job has failed to be completely executed within a threshold period of time; and updating a state of the first job that is stored within a distributed metadata store in response to determining that the first job has failed to be completely executed within the threshold period of time. 6. The method of claim 1 , wherein: the first set of files includes a first file that is stored as a plurality of chunks within the distributed file system, the first file comprises a full image-level backup of the first virtual machine. 7. The method of claim 1 , further comprising: detecting that the first job has failed to be completely executed within a threshold period of time or that the first job has failed; and undoing one or more tasks performed by the first job in response to detecting that the first job has failed to be completely executed within the threshold period of time or that the first job has failed. 8. The method of claim 1 , further comprising: detecting that the first node has failed while running the first job; and rolling back one or more tasks performed by the first job in response to detecting that that the first node has failed. 9. The method of claim 1 , wherein: the dependent base file comprises data differences between the first base image for the first virtual machine and the second base image for the second virtual machine. 10. The method of claim 1 , wherein: each data block within the first portion is separated by a fixed data length; and each data block within the second portion is separated by an increasing data length. 11. The method of claim 1 , wherein: the determining the first job associated with the first virtual machine includes determining a snapshot consolidation frequency for the first virtual machine and determining the first job based on the snapshot consolidation frequency. 12. A data management system, comprising: a distributed file system configured to store a first set of snapshots of a first virtual machine as a first set of files, the distributed file system configured to replicate the first set of files among a plurality of nodes within a cluster, the first set of snapshots includes a first base image for the first virtual machine, the distributed file system configured to store a second set of snapshots of a second virtual machine different from the first virtual machine as a second set of files, the distributed file system configured to replicate the second set of files among the plurality of nodes within the cluster, the second set of snapshots includes a second base image for the second virtual machine; and a distributed job scheduler configured to determine a first job associated with the first virtual machine to be performed, the distributed job scheduler comprises a plurality of job scheduling processes running on the plurality of nodes, each node of the plurality of nodes runs one of the plurality of job scheduling processes, the distributed job scheduler configured to determine that a first node of the plurality of nodes stores the first set of files and configured to run the first job on the first node in response to the determination that the first node stores the first set of files, the first job configured to generate a plurality of hash values corresponding with a plurality of data blocks within the first base image for the first virtual machine, the plurality of data blocks is arranged such that data blocks within a first portion of the first base image are spaced at a fixed distance from each other and other data blocks within a second portion of the first base image are spaced at monotonically increasing distances from each other, the first portion of the first base image does not overlap with the second portion of the first base image, the first job configured to compare the plurality of hash values with another plurality of hash values corresponding with a plurality of other data blocks within the second base image for the second virtual machine different from the virtual machine and configured to identify the second base image for the second virtual machine as a candidate base image from which a dependent base file for the first virtual machine is generated, the first job configured to generate the dependent base file using the first base image for t
File access structures, e.g. distributed indices (arrangements of input from, or output to, record carriers G06F3/06) · CPC title
by selection of backup contents · CPC title
Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title
Mapping; Conversion · CPC title
Point-in-time backing up or restoration of persistent data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.