Administering a shared, on-line pool of data storage resources for performing data storage operations
US-10152231-B2 · Dec 11, 2018 · US
US10496322B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10496322-B2 |
| Application number | US-201615321500-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 29, 2016 |
| Priority date | Mar 29, 2016 |
| Publication date | Dec 3, 2019 |
| Grant date | Dec 3, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques of backing up data stored on host computing devices involve selecting a backup server from among multiple servers on which to back up host data based on a measure of commonality between the host data and data stored in the backup servers. Prior to sending data for backup, a host sends a set of host data representations to a backup system. Each host data representation is based on a respective hash value computed from a respective block of the host data. The backup system compares the set of host data representations with server data representations for each backup server and computes a commonality score for each backup server. The backup system then selects a backup server on which to place the host data based at least in part on the commonality scores. Host data are then directed to the selected backup server for backup.
Opening claim text (preview).
What is claimed is: 1. A method of backing up data stored on host computing devices, the method comprising: receiving, by processing circuitry of a backup system, (i) a request to back up host data stored on a host computing device and (ii) a set of multiple host data representations, each host data representation based on a respective hash value computed from a respective block of the host data of the host computing device; computing multiple commonality scores, including one commonality score for each of multiple backup servers in the backup system, each commonality score for a respective backup server indicating a measure of commonality between the set of host data representations and a respective set of server data representations for the respective backup server, each backup server including a respective set of storage devices; and backing up the host data by (i) selecting one of the backup servers on which to back up the host data based at least in part on the commonality scores, and (ii) storing the host data in the set of storage devices of the selected backup server, wherein backing up the host data further includes performing a load balancing assessment, the load balancing assessment producing a load balancing result indicating relative loading of the backup servers, and wherein selecting one of the backup servers on which to back up the host data is also based in part on the load balancing result. 2. A method as in claim 1 , wherein selecting the backup server on which to back up the host data includes (i) comparing the commonality scores to identify a backup server having the highest commonality score and (ii) identifying the selected backup server as the backup server having the highest commonality score. 3. A method as in claim 2 , wherein receiving the set of multiple host data representations includes receiving, as each data representation, the respective hash value, wherein each hash value was computed by applying a cryptographic hash function to the respective block of host data, wherein each of the backup servers stores a respective set of data blocks, and wherein the method further comprises, for each backup server, generating the set of server data representations for the respective backup server by applying the cryptographic hash function to each of the set of data blocks of the respective backup server to produce, as the set of server data representations for the respective backup server, a set of hash values for the respective backup server. 4. A method as in claim 3 , wherein computing the multiple commonality scores includes, for each of the backup servers, counting a number of matches between the hash values received from the host and the set of hash values for the respective backup server to produce a total for the respective backup server, wherein the selected backup server is the backup server for which the largest total is produced. 5. A method as in claim 4 , wherein each hash value is M bits in length, and wherein applying the cryptographic function to each of the set of data blocks of each backup server includes (i) applying a hash function to generate a N-bit result and (ii) sampling the N-bit result to produce an M-bit result, wherein M is less than N. 6. A method as in claim 2 , wherein the set of host data representations includes a bloom filter of blocks of the host data, the bloom filter including (i) a set of cryptographic hash functions and (ii) a bit string of a predetermined length, each of the set of cryptographic hash functions mapping a block of host data to a respective position in the bit string, the bit string having a set of mapped positions and a set of unmapped positions, each mapped position in the bit string having a first value, each unmapped position in the bit string having a second value; wherein each of the backup servers stores respective blocks of data; wherein the method further comprises, for each of the backup servers, applying each of the set of cryptographic hash functions to a block of data of the respective backup server to produce a bit position of that hash function for that block of data; wherein computing the multiple commonality scores includes, for each of the backup servers, (i) applying each of the cryptographic hash functions to a block of data of the respective backup server to produce a set of bit positions and (ii) reducing the commonality score for the respective backup server in response to at least one of the produced set of bit positions of the bit string of the bloom filter having the second value. 7. A method as in claim 1 , wherein the host data representations received by the backup system pertain to a subset of all of the data blocks stored in the host computing device. 8. A method as in claim 7 , wherein each set of server data representations for a backup server pertains to a subset of all of the data blocks stored on the respective backup server. 9. A method as in claim 1 , wherein each set of server data representations for a particular backup server pertains to a subset of all of the data blocks stored on the particular backup server. 10. A computer program product including a set of non-transitory, computer-readable storage media storing executable instructions, which when executed by a computer, causes the computer to perform a method of backing up data stored on host computing devices, the method comprising: receiving (i) a request to back up host data stored on a host computing device and (ii) a set of multiple host data representations, each host data representation based on a respective hash value computed from a respective block of the host data of the host computing device; computing multiple commonality scores, including one commonality score for each of multiple backup servers in the backup system, each commonality score for a respective backup server indicating a measure of commonality between the set of host data representations and a respective set of server data representations for the respective backup server, each backup server including a respective set of storage devices; and backing up the host data by (i) selecting one of the backup servers on which to back up the host data based at least in part on the commonality scores, and (ii) storing the host data in the set of storage devices of the selected backup server, wherein selecting the backup server on which to back up the host data includes (i) comparing the commonality scores to identify a backup server having the highest commonality score and (ii) identifying the selected backup server as the backup server having the highest commonality score, wherein receiving the set of multiple host data representations includes receiving, as each data representation, the respective hash value, wherein each hash value was computed by applying a cryptographic hash function to the respective block of host data, wherein each of the backup servers stores a respective set of data blocks, wherein the method further comprises, for each backup server, generating the set of server data representations for the respective backup server by applying the cryptographic hash function to each of the set of data blocks of the respective backup server to produce, as the set of server data representations for the respective backup server, a set of hash values for the respective backup server, wherein computing the multiple commonality scores includes, for each of the backup servers, counting a number of matches between the hash values received from the host and the set of hash values for the respective backup server to produce a total for the respective backup server, and wherein the selected backup server is the backup server for which the largest total is produced.
Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title
Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title
Information retrieval; Database structures therefor; File system structures therefor · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
for networked environments · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.