Method and device for deduplication
US-10891261-B2 · Jan 12, 2021 · US
US11954118B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11954118-B2 |
| Application number | US-201816169399-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 24, 2018 |
| Priority date | Oct 27, 2017 |
| Publication date | Apr 9, 2024 |
| Grant date | Apr 9, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure relate to method, device and computer program product for data backup. The method comprises: in response to receiving from a backup server a data stream to be backed up, dividing the data stream into a plurality of data segments; distributing the plurality of data segments to at least one computing node; in response to receiving an index of a corresponding data segment from a first computing node of the at least one computing node, looking up the index in a global index cache, the index being generated by the first computing node to uniquely identify the data segment, the global index cache storing indexes of data in a backup storage device; in response to the missing index in the global index cache, adding the index into the global index cache; and sending to the first computing node an indication to store the data segment in the backup storage device.
Opening claim text (preview).
We claim: 1. A method of backing up data, comprising: in response to receiving from a backup server a data stream to be backed up, dividing, by a management node, the data stream into a plurality of data segments; distributing, by the management node, the plurality of data segments to at least one computing node, the management node being different from each computing node, wherein distributing the plurality of data segments to the at least one computing node comprises: increasing a probability of deduplication of the plurality of data segments based on a pre-binding of the backup server to a first computing node from among the at least one computing node; determining, from the at least one computing node, the pre-binding of the backup server to the first computing node; and distributing the plurality of data segments to the first computing node, wherein in response to distributing the plurality of data segments to the first computing node, a primary deduplication task is performed by the first computing node, and wherein during performance of the primary deduplication task by the first computing node, a hash uniquely identifying a respective data segment from among the plurality of data segments is calculated, the hash is looked up in a local hash cache of the first computing node, and in response to the hash being missed in the local hash cache, the hash is sent to the management node; in response to receiving the hash of the respective data segment at the management node from the first computing node, performing a secondary deduplication task by the management node, the secondary deduplication task performed by the management node comprising: looking up the hash in a global hash cache of the management node, the global hash cache storing hashes of data in a backup storage device, wherein looking up the hash includes accessing the global hash cache from the management node; and in response to missing the hash in the global hash cache of the management node, adding the hash into the global hash cache, and sending to the first computing node an indication to store the respective data segment in the backup storage device. 2. The method of claim 1 , further comprising: in response to (1) receiving, at the management node from a second computing node from among the at least one computing node, a hash of another data segment from among the plurality of data segments and (2) hitting the hash received from the second computing node in the global hash cache, sending, by the management node to the second computing node, an indication to discard the other data segment. 3. The method of claim 1 , wherein distributing the plurality of data segments to the at least one computing node further comprises: determining validity of the first computing node; and in response to the first computing node being valid, sending the plurality of data segments to the first computing node. 4. The method of claim 1 , further comprising: in response to receiving a second hash of a second data segment from a second computing node from among the at least one computing node, looking up the second hash in the global hash cache of the management node, the second hash uniquely identifying the second data segment being generated by the second computing node; in response to missing the second hash in the global hash cache of the management node, adding the second hash into the global hash cache, and sending to the second computing node a second indication to store the second data segment in the backup storage device. 5. The method of claim 4 , further comprising: in response to receiving a third hash of a third data segment from the first computing node, looking up the third hash in the global hash cache of the management node, the third hash uniquely identifying the third data segment being generated by the first computing node; in response to hitting the third hash in the global hash cache of the management node, avoiding adding the third hash into the global hash cache; and sending to the first computing node a third indication to discard the third data segment. 6. The method of claim 1 wherein each of the backup server, the management node, the first computing node, and the backup storage device is included in a scale-out data backup architecture, and the method further comprises: adding, to the scale-out data backup architecture, a second computing node from among the at least one computing node and a second backup storage device. 7. The method of claim 6 wherein in response to distributing the plurality of data segments to at least one computing node, a second primary deduplication task is performed by the second computing node, and wherein during performance of the second primary deduplication task by the second computing node, a second hash uniquely identifying a second data segment from among the plurality of data segments is calculated, the second hash is looked up in a second local hash cache of the second computing node, and in response to the second hash being missed in the second local hash cache, the second hash is sent to the management node. 8. The method of claim 7 further comprising: in response to receiving the second hash of the second data segment at the management node from the second computing node, performing a second secondary deduplication task by the management node, the second secondary deduplication task performed by the management node comprising: looking up the second hash in the global hash cache of the management node; and in response to missing the second hash in the global hash cache of the management node, adding the second hash into the global hash cache, and sending to the second computing node an indication to store the second data segment in the second backup storage device. 9. A method of backing up data, comprising: in response to receiving, from a management node at a first computing node from among at least one computing node, a data segment from a data stream, performing a primary deduplication task by the first computing node, the data stream being received at the management node from a backup server, a probability of deduplication of the plurality of data segments being increased based on a pre-binding of the backup server to the first computing node, the pre-binding being determined by the management node of the backup server to the first computing node, the plurality of data segments being distributed to the first computing node, the primary deduplication task being performed by the first computing node in response to the plurality of data segments being distributed to the first computing node, and the primary deduplication task performed by the first computing node comprising: calculating a hash uniquely identifying the data segment; looking up the hash in a local hash cache of the first computing node, the local hash cache storing hashes of data in a local backup storage device; and in response to missing the hash in the local hash cache, sending the hash to the management node, wherein in response to the hash of the data segment being received at the management node from the first computing node, a secondary deduplication task is performed by the management node, and wherein during performance of the secondary deduplication task by the management node, the hash is looked up in a global hash cache of the management node, hashes of data in the backup storage device being stored in the global hash cache, and in response to the hash being missed in the global hash cache of the management node, the hash is added into the global hash cache, and an indication to store the data segment in the backup storage device is sent to the first computing node; in response to receiving, at the firs
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
for networked environments · CPC title
Hash tables · CPC title
Management thereof · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.