Data synchronization using redundancy detection

US10284433B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10284433-B2
Application numberUS-201514750963-A
CountryUS
Kind codeB2
Filing dateJun 25, 2015
Priority dateJun 25, 2015
Publication dateMay 7, 2019
Grant dateMay 7, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Managing data in a cloud computing environment, including data transfers. File level and block level similarities are identified, including for archive and nested archive files, residing on datacenters and regional repositories. A replication plan is generated based on receiving a replication instruction, and further based on similarity clusters by transferring unique data blocks and files from best available sources including regional repositories.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for managing data on a plurality of servers, comprising: identifying one or more attributes of the plurality of servers; identifying shared attributes of the plurality of servers based on the identified attributes; assigning the plurality of servers to one or more server groups based on corresponding shared attributes of the plurality of servers, whereby each grouping of a set of servers among the plurality of servers defines a region, the region is associated with a datacenter, the datacenter includes an agent for tracking information of a local image library, wherein the local image library comprises a collection of images that are transferred by the agent between the plurality of servers; and replicating a virtual machine image onto a target server of the plurality of servers, wherein the virtual machine image comprises a plurality of data blocks, and wherein replicating the virtual machine image comprises: determining a first source image from a plurality of virtual machine images on the plurality of servers, wherein the first source virtual machine image comprises a first plurality of data blocks, wherein the first plurality of data blocks has a similarity index with respect to the virtual machine image is above a threshold value, and wherein the similarity index indicates that the first source virtual machine image is not identical to the virtual machine image; determining a first set of one or more data blocks of the virtual machine image contained in the first source image that match the virtual machine image; transferring the first set of one or more data blocks from the other server to the target server; determining a second set of one or more data blocks of the virtual machine image contained in other servers of the plurality of servers, wherein the second set of one or more data blocks does not contain any data blocks from the first set of the one or more data blocks; and transferring the second set of one or more data blocks from the other server to the target server. 2. The method of claim 1 , wherein the one or more attributes comprise at least one of: a geographical region; a geographical proximity relative to one or more servers; a cost factor; an accessibility factor; and server bandwidth. 3. The method of claim 1 , wherein the plurality of servers comprise one or more of a file transfer protocol (FTP) server, a hyper text transfer protocol (HTTP) server, and the datacenter. 4. The method of claim 1 , wherein data stored on the plurality of servers comprises a software package. 5. The method of claim 4 , wherein the software package is in archived format. 6. The method of claim 4 , wherein data blocks of the software package are distributed among at least two of the plurality of servers, whereby none of the plurality of servers store all data blocks of the software package. 7. The method of claim 1 , further comprising: receiving an instruction to replicate a designated data set having data blocks stored on one or more of the plurality of servers on a target server; and selecting one of the one or more servers as a source server, wherein the selecting is based on one or more of: the source server belonging to a same region as the target server; the source server belonging to a region having a defined minimum bandwidth available; the source server being the datacenter; and the source server having a trust factor meeting a threshold value, wherein the trust factor. 8. A computer system for managing data on a plurality of servers, comprising: one or more computer devices each having one or more processors and one or more tangible storage devices; and a program embodied on at least one of the one or more storage devices, the program having a plurality of program instructions for execution by the one or more processors, the program instructions comprising instructions for: identifying one or more attributes of the plurality of servers; identifying shared attributes of the plurality of servers based on the identified attributes; assigning the plurality of servers to one or more server groups based on corresponding shared attributes of the plurality of servers, whereby each grouping of a set of servers among the plurality of servers defines a region, the region is associated with a datacenter, the datacenter includes an agent for tracking information of a local image library, wherein the local image library comprises a collection of images that are transferred by the agent between the plurality of servers; and replicating a virtual machine image onto a target server of the plurality of servers, wherein the virtual machine image comprises a plurality of data blocks, and wherein replicating the virtual machine image comprises: determining a first source image from a plurality of virtual machine images on the plurality of servers, wherein the first source virtual machine image comprises a first plurality of data blocks, wherein the first plurality of data blocks has a similarity index with respect to the virtual machine image is above a threshold value, and wherein the similarity index indicates that the first source virtual machine image is not identical to the virtual machine image; determining a first set of one or more data blocks of the virtual machine image contained in the first source image that match the virtual machine image; transferring the first set of one or more data blocks from the other server to the target server; determining a second set of one or more data blocks of the virtual machine image contained in other servers of the plurality of servers; and transferring the second set of one or more data blocks from the other server to the target server. 9. The system of claim 8 , wherein the one or more attributes comprise at least one of: a geographical region; a geographical proximity relative to one or more servers; a cost factor; an accessibility factor; and server bandwidth. 10. The system of claim 8 , wherein the plurality of servers comprise one or more of a file transfer protocol (FTP) server, a hyper text transfer protocol (HTTP) server, and the datacenter. 11. The system of claim 8 , wherein data stored on the plurality of servers comprises a software package, and wherein data blocks of the software package are distributed among at least two of the plurality of servers, whereby none of the plurality of servers store all data blocks of the software package. 12. The system of claim 8 , wherein the program instructions further comprise instructions for: receiving an instruction to replicate a designated data set having data blocks stored on one or more of the plurality of servers on a target server; and selecting one of the one or more servers as a source server, wherein the selecting is based on one or more of: the source server belonging to a same region as the target server; the source server belonging to a region having a defined minimum bandwidth available; the source server being the datacenter; and the source server having a trust factor meeting a threshold value, wherein the trust factor. 13. A computer program product for managing data on a plurality of servers, comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: identifying one or more attributes of the plurality of servers, by the processor; identifying shared attributes of the plurality of servers, by the processor, based on the identified attributes; and assigning the plurality of servers, by the processor, to one or more server gr

Assignees

Inventors

Classifications

  • specially adapted for file transfer, e.g. file transfer protocol [FTP] · CPC title

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

  • based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title

  • H04L41/20Primary

    Network management software packages · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10284433B2 cover?
Managing data in a cloud computing environment, including data transfers. File level and block level similarities are identified, including for archive and nested archive files, residing on datacenters and regional repositories. A replication plan is generated based on receiving a replication instruction, and further based on similarity clusters by transferring unique data blocks and files from…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L67/1095. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 07 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).