Data synchronization using redundancy detection
US-9910906-B2 · Mar 6, 2018 · US
US10284433B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10284433-B2 |
| Application number | US-201514750963-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 25, 2015 |
| Priority date | Jun 25, 2015 |
| Publication date | May 7, 2019 |
| Grant date | May 7, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Managing data in a cloud computing environment, including data transfers. File level and block level similarities are identified, including for archive and nested archive files, residing on datacenters and regional repositories. A replication plan is generated based on receiving a replication instruction, and further based on similarity clusters by transferring unique data blocks and files from best available sources including regional repositories.
Opening claim text (preview).
What is claimed is: 1. A method for managing data on a plurality of servers, comprising: identifying one or more attributes of the plurality of servers; identifying shared attributes of the plurality of servers based on the identified attributes; assigning the plurality of servers to one or more server groups based on corresponding shared attributes of the plurality of servers, whereby each grouping of a set of servers among the plurality of servers defines a region, the region is associated with a datacenter, the datacenter includes an agent for tracking information of a local image library, wherein the local image library comprises a collection of images that are transferred by the agent between the plurality of servers; and replicating a virtual machine image onto a target server of the plurality of servers, wherein the virtual machine image comprises a plurality of data blocks, and wherein replicating the virtual machine image comprises: determining a first source image from a plurality of virtual machine images on the plurality of servers, wherein the first source virtual machine image comprises a first plurality of data blocks, wherein the first plurality of data blocks has a similarity index with respect to the virtual machine image is above a threshold value, and wherein the similarity index indicates that the first source virtual machine image is not identical to the virtual machine image; determining a first set of one or more data blocks of the virtual machine image contained in the first source image that match the virtual machine image; transferring the first set of one or more data blocks from the other server to the target server; determining a second set of one or more data blocks of the virtual machine image contained in other servers of the plurality of servers, wherein the second set of one or more data blocks does not contain any data blocks from the first set of the one or more data blocks; and transferring the second set of one or more data blocks from the other server to the target server. 2. The method of claim 1 , wherein the one or more attributes comprise at least one of: a geographical region; a geographical proximity relative to one or more servers; a cost factor; an accessibility factor; and server bandwidth. 3. The method of claim 1 , wherein the plurality of servers comprise one or more of a file transfer protocol (FTP) server, a hyper text transfer protocol (HTTP) server, and the datacenter. 4. The method of claim 1 , wherein data stored on the plurality of servers comprises a software package. 5. The method of claim 4 , wherein the software package is in archived format. 6. The method of claim 4 , wherein data blocks of the software package are distributed among at least two of the plurality of servers, whereby none of the plurality of servers store all data blocks of the software package. 7. The method of claim 1 , further comprising: receiving an instruction to replicate a designated data set having data blocks stored on one or more of the plurality of servers on a target server; and selecting one of the one or more servers as a source server, wherein the selecting is based on one or more of: the source server belonging to a same region as the target server; the source server belonging to a region having a defined minimum bandwidth available; the source server being the datacenter; and the source server having a trust factor meeting a threshold value, wherein the trust factor. 8. A computer system for managing data on a plurality of servers, comprising: one or more computer devices each having one or more processors and one or more tangible storage devices; and a program embodied on at least one of the one or more storage devices, the program having a plurality of program instructions for execution by the one or more processors, the program instructions comprising instructions for: identifying one or more attributes of the plurality of servers; identifying shared attributes of the plurality of servers based on the identified attributes; assigning the plurality of servers to one or more server groups based on corresponding shared attributes of the plurality of servers, whereby each grouping of a set of servers among the plurality of servers defines a region, the region is associated with a datacenter, the datacenter includes an agent for tracking information of a local image library, wherein the local image library comprises a collection of images that are transferred by the agent between the plurality of servers; and replicating a virtual machine image onto a target server of the plurality of servers, wherein the virtual machine image comprises a plurality of data blocks, and wherein replicating the virtual machine image comprises: determining a first source image from a plurality of virtual machine images on the plurality of servers, wherein the first source virtual machine image comprises a first plurality of data blocks, wherein the first plurality of data blocks has a similarity index with respect to the virtual machine image is above a threshold value, and wherein the similarity index indicates that the first source virtual machine image is not identical to the virtual machine image; determining a first set of one or more data blocks of the virtual machine image contained in the first source image that match the virtual machine image; transferring the first set of one or more data blocks from the other server to the target server; determining a second set of one or more data blocks of the virtual machine image contained in other servers of the plurality of servers; and transferring the second set of one or more data blocks from the other server to the target server. 9. The system of claim 8 , wherein the one or more attributes comprise at least one of: a geographical region; a geographical proximity relative to one or more servers; a cost factor; an accessibility factor; and server bandwidth. 10. The system of claim 8 , wherein the plurality of servers comprise one or more of a file transfer protocol (FTP) server, a hyper text transfer protocol (HTTP) server, and the datacenter. 11. The system of claim 8 , wherein data stored on the plurality of servers comprises a software package, and wherein data blocks of the software package are distributed among at least two of the plurality of servers, whereby none of the plurality of servers store all data blocks of the software package. 12. The system of claim 8 , wherein the program instructions further comprise instructions for: receiving an instruction to replicate a designated data set having data blocks stored on one or more of the plurality of servers on a target server; and selecting one of the one or more servers as a source server, wherein the selecting is based on one or more of: the source server belonging to a same region as the target server; the source server belonging to a region having a defined minimum bandwidth available; the source server being the datacenter; and the source server having a trust factor meeting a threshold value, wherein the trust factor. 13. A computer program product for managing data on a plurality of servers, comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: identifying one or more attributes of the plurality of servers, by the processor; identifying shared attributes of the plurality of servers, by the processor, based on the identified attributes; and assigning the plurality of servers, by the processor, to one or more server gr
specially adapted for file transfer, e.g. file transfer protocol [FTP] · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title
based on web technology, e.g. hypertext transfer protocol [HTTP] · CPC title
Network management software packages · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.