Backup server selection based on data commonality

US10496322B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10496322-B2
Application numberUS-201615321500-A
CountryUS
Kind codeB2
Filing dateMar 29, 2016
Priority dateMar 29, 2016
Publication dateDec 3, 2019
Grant dateDec 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques of backing up data stored on host computing devices involve selecting a backup server from among multiple servers on which to back up host data based on a measure of commonality between the host data and data stored in the backup servers. Prior to sending data for backup, a host sends a set of host data representations to a backup system. Each host data representation is based on a respective hash value computed from a respective block of the host data. The backup system compares the set of host data representations with server data representations for each backup server and computes a commonality score for each backup server. The backup system then selects a backup server on which to place the host data based at least in part on the commonality scores. Host data are then directed to the selected backup server for backup.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of backing up data stored on host computing devices, the method comprising: receiving, by processing circuitry of a backup system, (i) a request to back up host data stored on a host computing device and (ii) a set of multiple host data representations, each host data representation based on a respective hash value computed from a respective block of the host data of the host computing device; computing multiple commonality scores, including one commonality score for each of multiple backup servers in the backup system, each commonality score for a respective backup server indicating a measure of commonality between the set of host data representations and a respective set of server data representations for the respective backup server, each backup server including a respective set of storage devices; and backing up the host data by (i) selecting one of the backup servers on which to back up the host data based at least in part on the commonality scores, and (ii) storing the host data in the set of storage devices of the selected backup server, wherein backing up the host data further includes performing a load balancing assessment, the load balancing assessment producing a load balancing result indicating relative loading of the backup servers, and wherein selecting one of the backup servers on which to back up the host data is also based in part on the load balancing result. 2. A method as in claim 1 , wherein selecting the backup server on which to back up the host data includes (i) comparing the commonality scores to identify a backup server having the highest commonality score and (ii) identifying the selected backup server as the backup server having the highest commonality score. 3. A method as in claim 2 , wherein receiving the set of multiple host data representations includes receiving, as each data representation, the respective hash value, wherein each hash value was computed by applying a cryptographic hash function to the respective block of host data, wherein each of the backup servers stores a respective set of data blocks, and wherein the method further comprises, for each backup server, generating the set of server data representations for the respective backup server by applying the cryptographic hash function to each of the set of data blocks of the respective backup server to produce, as the set of server data representations for the respective backup server, a set of hash values for the respective backup server. 4. A method as in claim 3 , wherein computing the multiple commonality scores includes, for each of the backup servers, counting a number of matches between the hash values received from the host and the set of hash values for the respective backup server to produce a total for the respective backup server, wherein the selected backup server is the backup server for which the largest total is produced. 5. A method as in claim 4 , wherein each hash value is M bits in length, and wherein applying the cryptographic function to each of the set of data blocks of each backup server includes (i) applying a hash function to generate a N-bit result and (ii) sampling the N-bit result to produce an M-bit result, wherein M is less than N. 6. A method as in claim 2 , wherein the set of host data representations includes a bloom filter of blocks of the host data, the bloom filter including (i) a set of cryptographic hash functions and (ii) a bit string of a predetermined length, each of the set of cryptographic hash functions mapping a block of host data to a respective position in the bit string, the bit string having a set of mapped positions and a set of unmapped positions, each mapped position in the bit string having a first value, each unmapped position in the bit string having a second value; wherein each of the backup servers stores respective blocks of data; wherein the method further comprises, for each of the backup servers, applying each of the set of cryptographic hash functions to a block of data of the respective backup server to produce a bit position of that hash function for that block of data; wherein computing the multiple commonality scores includes, for each of the backup servers, (i) applying each of the cryptographic hash functions to a block of data of the respective backup server to produce a set of bit positions and (ii) reducing the commonality score for the respective backup server in response to at least one of the produced set of bit positions of the bit string of the bloom filter having the second value. 7. A method as in claim 1 , wherein the host data representations received by the backup system pertain to a subset of all of the data blocks stored in the host computing device. 8. A method as in claim 7 , wherein each set of server data representations for a backup server pertains to a subset of all of the data blocks stored on the respective backup server. 9. A method as in claim 1 , wherein each set of server data representations for a particular backup server pertains to a subset of all of the data blocks stored on the particular backup server. 10. A computer program product including a set of non-transitory, computer-readable storage media storing executable instructions, which when executed by a computer, causes the computer to perform a method of backing up data stored on host computing devices, the method comprising: receiving (i) a request to back up host data stored on a host computing device and (ii) a set of multiple host data representations, each host data representation based on a respective hash value computed from a respective block of the host data of the host computing device; computing multiple commonality scores, including one commonality score for each of multiple backup servers in the backup system, each commonality score for a respective backup server indicating a measure of commonality between the set of host data representations and a respective set of server data representations for the respective backup server, each backup server including a respective set of storage devices; and backing up the host data by (i) selecting one of the backup servers on which to back up the host data based at least in part on the commonality scores, and (ii) storing the host data in the set of storage devices of the selected backup server, wherein selecting the backup server on which to back up the host data includes (i) comparing the commonality scores to identify a backup server having the highest commonality score and (ii) identifying the selected backup server as the backup server having the highest commonality score, wherein receiving the set of multiple host data representations includes receiving, as each data representation, the respective hash value, wherein each hash value was computed by applying a cryptographic hash function to the respective block of host data, wherein each of the backup servers stores a respective set of data blocks, wherein the method further comprises, for each backup server, generating the set of server data representations for the respective backup server by applying the cryptographic hash function to each of the set of data blocks of the respective backup server to produce, as the set of server data representations for the respective backup server, a set of hash values for the respective backup server, wherein computing the multiple commonality scores includes, for each of the backup servers, counting a number of matches between the hash values received from the host and the set of hash values for the respective backup server to produce a total for the respective backup server, and wherein the selected backup server is the backup server for which the largest total is produced.

Assignees

Inventors

Classifications

  • Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title

  • Comparing digital values (G06F7/06, {G06F7/22,} G06F7/38 take precedence) · CPC title

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • for networked environments · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10496322B2 cover?
Techniques of backing up data stored on host computing devices involve selecting a backup server from among multiple servers on which to back up host data based on a measure of commonality between the host data and data stored in the backup servers. Prior to sending data for backup, a host sends a set of host data representations to a backup system. Each host data representation is based on a r…
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).