Metadata optimization for network replication using representative of metadata batch

US9934237B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9934237-B1
Application numberUS-201514963224-A
CountryUS
Kind codeB1
Filing dateDec 8, 2015
Priority dateMar 5, 2013
Publication dateApr 3, 2018
Grant dateApr 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A target storage system receives a representative fingerprint and fingerprint representations from a source storage system. Each fingerprint representation contains only a portion of a corresponding fingerprint and the representative fingerprint is a full fingerprint. The fingerprints of the data chunks are missing at the target storage system are identified based on the fingerprint representative and the fingerprint representations. A bitmap is transmitted to the source storage system, each bit indicating whether one of the fingerprints is missing. One or more fingerprints are received from the source storage system that are missing at the target storage system based on the bitmask. One or more missing data chunks are identified based on at least the one or more fingerprints received from the source storage system. The missing data chunks are then received from the source storage system to be stored at the target storage system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for replicating data from a source storage system to a target storage system, the method comprising: receiving, at a target storage system, a representative fingerprint and a plurality of fingerprint representations from a source storage system, without receiving all fingerprints of a plurality of data chunks, wherein each fingerprint representation contains only a portion of a corresponding fingerprint and the representative fingerprint is a full fingerprint; identifying which of the fingerprints of the data chunks are missing at the target storage system based on the fingerprint representative and the fingerprint representations; transmitting a bitmask to the source storage system, the bitmask having a plurality of bits, each bit indicating whether one of the fingerprints is missing at the target storage system; receiving from the source storage system one or more fingerprints that are missing at the target storage system that are determined based on the bitmask; identifying one or more missing data chunks based on at least the one or more fingerprints received from the source storage system; and receiving the missing data chunks from the source storage system to be stored at the target storage system. 2. The method of claim 1 , wherein the representative fingerprint was selected by the source storage system if the representative fingerprint matches a predetermined pattern. 3. The method of claim 1 , wherein each fingerprint representation was generated by the source storage system based on a portion of the corresponding fingerprint. 4. The method of claim 1 , further comprising: hashing one or more fingerprints indicated by the bitmask that have been stored at the target storage system to generate a first hash value; and transmitting the first hash value to the source storage system, wherein the source storage system is to hash one or more fingerprints indicated by the bitmask stored at the source storage system to generate a second hash value and to compare the first hash value with the second hash value to confirm that the target storage system indeed has the fingerprints indicated by the bitmask. 5. The method of claim 4 , further comprising: receiving only the missing fingerprints from the source storage system if the first and second hash values match; and receiving entire full fingerprints from the source storage system if the first and second hash values do not match. 6. The method of claim 1 , wherein the representative fingerprint and the fingerprint representations are utilized to reconstruct the fingerprints of the data chunks to determine whether any of the data chunks have been stored at the target storage system. 7. The method of claim 6 , further comprising: identifying a container based on the representative fingerprint, wherein the container is one of a plurality of containers for storing data chunks, retrieving fingerprints from the identified container, and determining whether the retrieved fingerprints of the container include fingerprints represented by the fingerprint representations received from the source storage system. 8. The method of claim 7 , further comprising: generating a hash value by hashing fingerprints that exist in the target storage system, and transmitting the hash value to the source storage system to allow the source storage system to confirm whether the target system has at least some of the fingerprints. 9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for replicating data from a source storage system to a target storage system, the operations comprising: receiving, at a target storage system, a representative fingerprint and a plurality of fingerprint representations from a source storage system, without receiving all fingerprints of a plurality of data chunks, wherein each fingerprint representation contains only a portion of a corresponding fingerprint and the representative fingerprint is a full fingerprint; identifying which of the fingerprints of the data chunks are missing at the target storage system based on the fingerprint representative and the fingerprint representations; transmitting a bitmask to the source storage system, the bitmask having a plurality of bits, each bit indicating whether one of the fingerprints is missing at the target storage system; receiving from the source storage system one or more fingerprints that are missing at the target storage system that are determined based on the bitmask; identifying one or more missing data chunks based on at least the one or more fingerprints received from the source storage system; and receiving the missing data chunks from the source storage system to be stored at the target storage system. 10. The non-transitory machine-readable medium of claim 9 , wherein the representative fingerprint was selected by the source storage system if the representative fingerprint matches a predetermined pattern. 11. The non-transitory machine-readable medium of claim 9 , wherein each fingerprint representation was generated by the source storage system based on a portion of the corresponding fingerprint. 12. The non-transitory machine-readable medium of claim 9 , wherein the operations further comprise: hashing one or more fingerprints indicated by the bitmask that have been stored at the target storage system to generate a first hash value; and transmitting the first hash value to the source storage system, wherein the source storage system is to hash one or more fingerprints indicated by the bitmask stored at the source storage system to generate a second hash value and to compare the first hash value with the second hash value to confirm that the target storage system indeed has the fingerprints indicated by the bitmask. 13. The non-transitory machine-readable medium of claim 12 , wherein the operations further comprise: receiving only the missing fingerprints from the source storage system if the first and second hash values match; and receiving entire full fingerprints from the source storage system if the first and second hash values do not match. 14. The non-transitory machine-readable medium of claim 9 , wherein the representative fingerprint and the fingerprint representations are utilized to reconstruct the fingerprints of the data chunks to determine whether any of the data chunks have been stored at the target storage system. 15. The non-transitory machine-readable medium of claim 14 , wherein the operations further comprise: identifying a container based on the representative fingerprint, wherein the container is one of a plurality of containers for storing data chunks, retrieving fingerprints from the identified container, and determining whether the retrieved fingerprints of the container include fingerprints represented by the fingerprint representations received from the source storage system. 16. The non-transitory machine-readable medium of claim 15 , wherein the operations further comprise: generating a hash value by hashing fingerprints that exist in the target storage system, and transmitting the hash value to the source storage system to allow the source storage system to confirm whether the target system has at least some of the fingerprints. 17. A data processing system operating as a target storage system, comprising: a processor; a memory coupled to the processor storing instructions; and a replication engine coupled to the processor and memory to perfor

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Redundant storage or storage space (G06F11/2056 takes precedence) · CPC title

  • using de-duplication of the data · CPC title

  • Physics · mapped topic

  • Management of the data involved in backup or backup restore · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9934237B1 cover?
A target storage system receives a representative fingerprint and fingerprint representations from a source storage system. Each fingerprint representation contains only a portion of a corresponding fingerprint and the representative fingerprint is a full fingerprint. The fingerprints of the data chunks are missing at the target storage system are identified based on the fingerprint representat…
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/2094. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).