Synchronization of automation scripts among different computing systems
US-2024054025-A1 · Feb 15, 2024 · US
US9910906B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9910906-B2 |
| Application number | US-201514750944-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 25, 2015 |
| Priority date | Jun 25, 2015 |
| Publication date | Mar 6, 2018 |
| Grant date | Mar 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Managing data in a cloud computing environment, including data transfers. File level and block level similarities are identified, including for archive and nested archive files, residing on datacenters and regional repositories. A replication plan is generated based on receiving a replication instruction, and further based on similarity clusters by transferring unique data blocks and files from best available sources including regional repositories.
Opening claim text (preview).
What is claimed is: 1. A method for transferring data on a plurality of computing nodes, comprising: receiving a request to transfer a first dataset from a source datacenter to a target datacenter; generating a plurality of similarity clusters, wherein each of the plurality of similarity clusters identifies a grouping of data blocks and comprises a list of hash codes of the data blocks and further comprises an image cluster identifier, and wherein the plurality of similarity clusters indicate a block-level similarity between data stored on a first computing node with the data stored on at least one other computing node among the plurality of computing nodes, wherein data stored on at least one computing node in the plurality of computing nodes comprises archived data, and wherein generating the plurality of similarity clusters comprises: extracting the archived data; comparing checksums of the extracted data; and generating the plurality of similarity clusters based on comparing the checksums. 2. The method of claim 1 , wherein additional data stored on the at least one computing node or on another computing node in the plurality of computing nodes, or both, comprises virtual machine (VM) image data, and wherein generating the similarity clusters further comprises: comparing checksums of the identified files with additional checksums of the VM image data; and generating the plurality of similarity clusters based on comparing the checksums with the additional checksums. 3. The method of claim 1 , further comprising: receiving an instruction to replicate a designated data set, stored on a source computing node, on a target computing node, wherein the source and target computing nodes are among the plurality of computing nodes; identifying a set of similarity clusters that are associated with the designated data set from among the plurality of similarity clusters; identifying a first subset of the set of similarity clusters, wherein data associated with the first subset of similarity clusters is stored only on the source computing node; identifying a second subset of the set of similarity clusters, wherein data associated with the second subset of similarity clusters is stored at least on the source computing node and on the target computing node; and identifying a third subset of the set of similarity clusters, wherein data associated with the third subset of similarity clusters is stored on the source computing node and a set of computing nodes other than the source computing node and other than the target computing node. 4. The method of claim 3 , further comprising generating a data replication plan, wherein the generating comprises: identifying the source computing node as a source for replicating the data associated with the first subset of similarity clusters; identifying at least one computing node among the set of computing nodes other than the source computing node and other than the target computing node as a source for replicating the data associated with the third subset of similarity clusters; and generating the data transfer plan based on the identifying. 5. The method of claim 4 , further comprising: generating an instruction to replicate the designated data set on the target computing node based on the data replication plan, whereby replication of the data associated with the second subset of similarity clusters on the target computing node is performed without transferring the data to the target computing node. 6. The method of claim 4 , where generating the data transfer plan further comprises: identifying a set of data repositories associated with a region of the source computing node, a region of the at least one computing node, or both; wherein generating the data transfer plan is further based on identifying the set of data repositories. 7. The method of claim 5 , further comprising: de-duplicating the un-archived data; generating the plurality of similarity clusters based on the de-duplicating. 8. The method of claim 1 , wherein the un-archiving comprises: recursively un-archiving nested archived data. 9. The method of claim 1 , wherein a format of the archived data is one of: tar.gz, tar.bz2, tar.xz, tgz, zip, tar, rar, rpm, and tcdriver. 10. A computer system for managing data on a plurality of computing nodes, comprising: a computer device having a processor and a tangible storage device; and a program embodied on the storage device for execution by the processor, the program having a plurality of program instructions for generating a plurality of similarity clusters, wherein each of the plurality of similarity clusters identifies a grouping of data blocks and comprises a list of hash codes of the data blocks and further comprises an image cluster identifier, and wherein the plurality of similarity clusters indicate a block-level similarity between data stored on a first computing node with the data stored on at least one other computing node among the plurality of computing nodes, wherein data stored on at least one computing node in the plurality of computing nodes comprises archived data, and wherein generating the plurality of similarity clusters comprises: extracting the archived data; comparing checksums of the extracted data; and generating the plurality of similarity clusters based on comparing the checksums. 11. The system of claim 10 , wherein additional data stored on the at least one computing node or on another computing node in the plurality of computing nodes, or both, comprises virtual machine (VM) image data, and wherein generating the similarity clusters further comprises: comparing checksums of the identified files with additional checksums of the VM image data; and generating the plurality of similarity clusters based on comparing the checksums with the additional checksums. 12. The system of claim 10 , wherein the program instructions further comprise instructions for: receiving an instruction to replicate a designated data set, stored on a source computing node, on a target computing node, wherein the source and target computing nodes are among the plurality of computing nodes; identifying a set of similarity clusters that are associated with the designated data set from among the plurality of similarity clusters; identifying a first subset of the set of similarity clusters, wherein data associated with the first subset of similarity clusters is stored only on the source computing node; identifying a second subset of the set of similarity clusters, wherein data associated with the second subset of similarity clusters is stored at least on the source computing node and on the target computing node; and identifying a third subset of the set of similarity clusters, wherein data associated with the third subset of similarity clusters is stored on the source computing node and a set of computing nodes other than the source computing node and other than the target computing node. 13. The system of claim 12 , wherein the program instructions further comprise instructions for generating a data replication plan, wherein the generating comprises: identifying the source computing node as a source for replicating the data associated with the first subset of similarity clusters; identifying at least one computing node among the set of computing nodes other than the source computing node and other than the target computing node as a source for replicating the data associated with the third subset of similarity clusters; and generating the data transfer plan based on the identifying. 14. The system of claim 13 , wherein the program instructions further
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Techniques for file synchronisation in file systems · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Hypervisor-specific management and integration aspects · CPC title
Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.