Hash migration using a gold image library management system

US11797206B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11797206-B2
Application numberUS-202117200506-A
CountryUS
Kind codeB2
Filing dateMar 12, 2021
Priority dateDec 17, 2020
Publication dateOct 24, 2023
Grant dateOct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for migrating hash values for backup data blocks in a network of data protection targets (DPTs) and a common data protection target implementing a Gold image library management system in which backups of Gold images used as templates for physical machines and virtual machines are stored on the CDPT. The CDPT and each DPT stores backup data split into chunks that are uniquely identified by a respective hash of its contents, and maintains data structures comprising the hash, chunk size, chunk data, and a list of DPT and CDPT identifiers. The hashes are partitioned into a set of buckets in the CDPT. A Bloom filter is generated for each bucket of hashes, and stored in each DPT so that each DPT stores Bloom filters for all CDPTs in the network. Each DPT checks its list of hashes against the Bloom filters in each of the DPTs to determine whether to keep or free chunks of data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of migrating hashes for backup data blocks in a network of data protection targets (DPTs), comprising: storing the hashes in a common data protection target (CDPT) of the network, and partitioning the hashes into a set of buckets in the CDPT; generating a Bloom filter for each bucket of hashes, wherein the Bloom filter comprises a probabilistic data structure that is tunable with respect to size to reduce or increase a possibility of false positives versus an amount of data traffic sent through the network; sending to each DPT the generated Bloom filters, so that each DPT stores Bloom filters for the CDPT; first checking in a DPT if a hash of the data is matched in a Bloom filter stored in the DPT; sending, if there is a match, the hash to the CDPT from the DPT indicating that there is a possibility that the hash exists in the CDPT; and second checking, in the CDPT, whether the hash exists, and wherein the network implements a Gold image library management system storing backups of Gold images used as templates for physical machines and virtual machines on the CDPT, and wherein the CDPT and each DPT stores backup data split into chunks that are uniquely identified by a respective hash, and maintains data structures comprising the hash, chunk size, chunk data, and a list of DPT and CDPT identifiers. 2. The method of claim 1 wherein the set of buckets comprises N buckets using a first M bits of the hash as an index into a respective bucket, such that 2 M =N. 3. The method of claim 1 further comprising, prior to the second checking: adding, in the CDPT, the DPT ID to an entry for the hash; sending, from the CDPT to the DPT, a response message indicating whether or not the hash was a match in the CDPT. 4. The method of claim 3 further comprising, after the second checking: processing the response message in the DPT; updating in the DPT, if there is a match, the entry for the hash with an identifier of the CDPT; freeing local data in the DPT corresponding to the hash; and keeping in the DPT, if there is not a match, the hash and the local data. 5. The method of claim 4 wherein the CDPT is configured to batch process hashes received from a plurality of DPTs, and wherein the first checking, sending, and second checking steps are performed for each DPT of the plurality of DPTs. 6. The method of claim 1 wherein the CDPT is provided as a separate storage target from the DPTs, and wherein the Gold image library management system: copies, during a backup operation for a client data source, the user content data from the client to the DPT target and copying the Gold images to the CDPT, and references the structural data in the DPT to prevent redundant storage of the Gold images in the DPT. 7. The method of claim 1 further comprising performing a load balancing operation among the DPTs by distributing newly added data among the DPTs for load balancing factors including available stream count, network latency and data throughput from a client data source and later freeing data that is already stored in the CDPT. 8. The method of claim 1 further comprising performing a point-to-point copy operation between a first DPT and a second DPT, or from the CDPT to the first DPT, by checking if data to be replicated or migrated in the network is already stored in the CDPT. 9. The method of claim 8 wherein the point-to-point copy operation comprises: checking if a chunk to be sent to the first DPT, is stored locally on the first DPT, and if so, writing a new copy of the chunk on the second DPT; if the chunk is not stored locally, writing a new entry to the second DPT with a pointer to the ID of the CDPT; and notifying the CDPT to add the second DPT ID to the hash for the chunk. 10. A method of migrating hashes for backup data blocks in a network of data protection targets (DPTs) and a common data protection target (CDPT) implementing a Gold image library management system, the method comprising: storing backups of Gold images used as templates for physical machines and virtual machines on the CDPT, and wherein the CDPT and each DPT: stores backup data split into chunks that are uniquely identified by a respective hash, and maintains data structures comprising the hash, chunk size, chunk data, and a list of DPT and CDPT identifiers; partitioning the hashes into a set of buckets in the CDPT; generating a Bloom filter for each bucket of hashes; storing in each DPT the generated Bloom filters for the CDPT, so that each DPT stores Bloom filters for a respective DPT and the CDPT; and checking, in response to a data storage request, a list of hashes in the CDPT against the Bloom filters in each of the DPTs to identify a Gold image among the stored backups of Gold images for shared use by the DPTs. 11. The method of claim 10 wherein the checking step further comprises: first checking in a DPT and in response to a backup request for data sent from the CDPT, if a hash of the data is matched in a Bloom filter stored in the DPT; sending, if there is a match, the hash to the CDPT from the DPT indicating that there is a possibility that the hash exists in the CDPT; and second checking, in the CDPT, whether the hash exists. 12. The method of claim 11 wherein each Bloom filter comprises a probabilistic data structure that is tunable with respect to size to reduce or increase a possibility of false positives versus an amount of data traffic sent through the network. 13. The method of claim 10 further comprising performing a load balancing operation among the DPTs newly added to the network by checking hashes of data chunks in the newly added DPTs against hashes stored in the CDPT, and distributing the newly added data chunks among the DPTs for load balancing factors including available stream count, network latency and data throughput from a client data source. 14. The method of claim 10 further comprising performing a point-to-point copy operation between a first DPT and a second DPT, or from the CDPT to the first DPT, by checking if data to be replicated or migrated in the network is already stored in the CDPT. 15. A system for of migrating hash values for backup data blocks in a network of data protection targets (DPTs) and a common data protection target (CDPT) implementing a Gold image library management system, the system comprising: a first CDPT component storing backups of Gold images used as templates for physical machines and virtual machines, and wherein the CDPT and each DPT: stores backup data split into chunks that are uniquely identified by a respective hash, and maintains data structures comprising the hash, chunk size, chunk data, and a list of DPT and CDPT identifiers; a second CDPT component partitioning the hashes into a set of buckets in the CDPT, and generating a Bloom filter for each bucket of hashes; and a DPT storing the generated Bloom filters locally, so that the DPT stores Bloom filters for a respective DPT and the CDPT, the DPT further checking, in response to a data storage request, a list of hashes in the CDPT against the Bloom filters in each of the DPTs to identify a Gold image among the stored backups of Gold images for shared use by the DPTs. 16. The system of claim 15 wherein the DPT further checks, in response to a backup request for data sent from the CDPT, if a hash of the data is matched in the Bloom filter, and sends if there is a match, the hash to the CDPT from the DPT indicating that there is a possibility that the hash exists in the CDPT, and wherein in response, checking, in the CDPT, whether the hash exists.

Assignees

Inventors

Classifications

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Saving storage space on storage systems · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • Migration mechanisms · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11797206B2 cover?
Embodiments for migrating hash values for backup data blocks in a network of data protection targets (DPTs) and a common data protection target implementing a Gold image library management system in which backups of Gold images used as templates for physical machines and virtual machines are stored on the CDPT. The CDPT and each DPT stores backup data split into chunks that are uniquely identif…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).