Advanced object replication using reduced metadata in object storage environments

US11630735B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11630735-B2
Application numberUS-201615248995-A
CountryUS
Kind codeB2
Filing dateAug 26, 2016
Priority dateAug 26, 2016
Publication dateApr 18, 2023
Grant dateApr 18, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for, in an object storage environment, managing data replication between first and second sites of a distributed computing environment by one or more processors. A first pass metadata hash is calculated for each of the objects in an object-set that is subsequently transferred from the first to the second site. Responsive to the second site, a second pass metadata hash is calculated for remaining objects of the object-set that are identified by the second site at a sub-object level using a predetermined size.

First claim

Opening claim text (preview).

The invention claimed is: 1. In an object storage environment, a method for managing data replication between first and second sites of a distributed computing environment by one or more processors, comprising: receiving a replication request for replicating an object-set from the first site to a second site by a user or process, wherein the first site comprises a user's computer at a local site at a first location and the second site comprises a distributed storage system at a remote site at a second location physically remote to the first location; calculating, at the first site by the user's computer, a first pass metadata hash for each of the objects in the object-set the user desires to transfer to the distributed storage system, wherein the first pass metadata hash is assembled into a first pass assembled metadata map, containing only hashes for the objects in the object-set specific to the replication request, that is subsequently transferred from the first to the second site, and wherein the calculating of the first pass metadata hash for each of the objects in the object-set is performed only after receiving the replication request at the first site and prior to commencing transfer of any of the objects in the object-set to the second site; in response to the replication request and prior to replicating the objects of the object-set from the first site to the second site, transmitting, at a first time, only the first pass assembled metadata map containing the first pass metadata hash for each of the objects in the object-set to the second site; responsive to the second site receiving only the first pass assembled metadata map containing the first pass metadata hash and prior to receiving any data of the object-set other than data of the first pass metadata hash of each object in the object-set, performing a first comparison, by the second site, of the first pass metadata hash for each of the objects to a local metadata map of a global metadata repository of the second site to identify remaining objects of the object-set that are missing at the second site, wherein the local metadata map of the global metadata repository of the second site is precomputed prior to receiving the replication request and contains all data structures globally accessible to the second site, and the first comparison compares the first pass metadata hash for each object in the object-set to all of the globally-accessible data structures of the second site, wherein the remaining objects are identified as portions of data of any portion of any data structure of any object not stored nor globally accessible to the second site, and wherein the remaining objects are to be replicated as new objects to the second site such that the remaining objects do not comprise difference data to be added to an existing object at the second site; responsive to identifying the remaining objects, generating and compressing a list of the remaining objects by the second site, wherein the compressed list of remaining objects is transmitted to the first site; responsive to receiving the compressed list of remaining objects, calculating, at the first site, a second pass metadata hash for the remaining objects of the object-set at a sub-object level using a predetermined size such that object metadata of the predetermined size is calculated for each sub-object of each object of the remaining objects, wherein the first site assembles the second pass metadata hash for each of the sub-objects of the remaining objects into a second pass assembled metadata map; transmitting, by the first site at a second time, the second pass assembled metadata map containing the second pass metadata hash for each sub-object of each object of the remaining objects from the first site to the second site; responsive to receiving the second pass metadata hash from the first site, performing a second comparison, by the second site, of the second pass metadata hash for each sub-object of the remaining objects to identify those sub-objects of the object set missing at the second site; responsive to the second comparison, generating, by the second site, a missing sub-object list of those sub-objects of the object-set determined as missing from the second site; compressing the missing sub-object list, by the second site, and transmitting the compressed missing sub-object list from the second site to the first site, wherein the first site receives the compressed missing sub-object list as an updated data transfer request; responsive to receiving the updated data transfer request, compressing, by the first site, the missing sub-objects of the object-set; transferring, at a third time, only the missing sub-objects of the object-set from the first site to the second site; and receiving, by the second site, the missing sub-objects of the object set and storing the missing sub-objects at a location in the second site according to the second pass assembled metadata map, wherein, in conjunction with receiving the missing sub-objects, the second site incorporates, at the third time, the second pass assembled metadata map into the global metadata repository when storing the missing sub-objects to expand the global metadata repository to account for the received missing sub-objects, and wherein no additional metadata computations are performed with respect to replicating the missing sub-objects subsequent to the third time. 2. The method of claim 1 , further including performing the second pass metadata hash calculation on the remaining objects obtained after accounting for the missing objects in the object set. 3. The method of claim 1 , wherein the first and second comparisons proceed as a global process. 4. The method of claim 1 , further including identifying the object-set for replication from the first to the second site. 5. In an object storage environment, a system for managing data replication between first and second sites of a distributed computing environment, comprising: one or more processors, integrated into a portion of the distributed computing environment, that: receive a replication request for replicating an object-set from the first site to a second site by a user or process, wherein the first site comprises a user's computer at a local site at a first location and the second site comprises a distributed storage system at a remote site at a second location physically remote to the first location; calculate, at the first site by the user's computer, a first pass metadata hash for each of the objects in the object-set the user desires to transfer to the distributed storage system, wherein the first pass metadata hash is assembled into a first pass assembled metadata map, containing only hashes for the objects in the object-set specific to the replication request, that is subsequently transferred from the first to the second site, and wherein the calculating of the first pass metadata hash for each of the objects in the object-set is performed only after receiving the replication request at the first site and prior to commencing transfer of any of the objects in the object-set to the second site, in response to the replication request and prior to replicating the objects of the object-set from the first site to the second site, transmit, at a first time, only the first pass assembled metadata map containing the first pass metadata hash for each of the objects in the object-set to the second site, responsive to the second site receiving only the first pass assembled metadata map containing the first pass metadata hash and prior to receiving any data of the object-set other than data of the first pass metadata hash of each object in the object-set, perform a first comparison, by the second site, of the first pass metadata hash for each of the objects to a local metadata map of a global m

Assignees

Inventors

Classifications

  • for networked environments · CPC title

  • using de-duplication of the data · CPC title

  • Query processing · CPC title

  • Hash tables · CPC title

  • G06F16/51Primary

    Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11630735B2 cover?
Embodiments for, in an object storage environment, managing data replication between first and second sites of a distributed computing environment by one or more processors. A first pass metadata hash is calculated for each of the objects in an object-set that is subsequently transferred from the first to the second site. Responsive to the second site, a second pass metadata hash is calculated …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 18 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).