What technology area does this patent fall under?

Primary CPC classification G06F16/27. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, device and computer program product for data backup

US11954118B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11954118-B2
Application number	US-201816169399-A
Country	US
Kind code	B2
Filing date	Oct 24, 2018
Priority date	Oct 27, 2017
Publication date	Apr 9, 2024
Grant date	Apr 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to method, device and computer program product for data backup. The method comprises: in response to receiving from a backup server a data stream to be backed up, dividing the data stream into a plurality of data segments; distributing the plurality of data segments to at least one computing node; in response to receiving an index of a corresponding data segment from a first computing node of the at least one computing node, looking up the index in a global index cache, the index being generated by the first computing node to uniquely identify the data segment, the global index cache storing indexes of data in a backup storage device; in response to the missing index in the global index cache, adding the index into the global index cache; and sending to the first computing node an indication to store the data segment in the backup storage device.

First claim

Opening claim text (preview).

We claim: 1. A method of backing up data, comprising: in response to receiving from a backup server a data stream to be backed up, dividing, by a management node, the data stream into a plurality of data segments; distributing, by the management node, the plurality of data segments to at least one computing node, the management node being different from each computing node, wherein distributing the plurality of data segments to the at least one computing node comprises: increasing a probability of deduplication of the plurality of data segments based on a pre-binding of the backup server to a first computing node from among the at least one computing node; determining, from the at least one computing node, the pre-binding of the backup server to the first computing node; and distributing the plurality of data segments to the first computing node, wherein in response to distributing the plurality of data segments to the first computing node, a primary deduplication task is performed by the first computing node, and wherein during performance of the primary deduplication task by the first computing node, a hash uniquely identifying a respective data segment from among the plurality of data segments is calculated, the hash is looked up in a local hash cache of the first computing node, and in response to the hash being missed in the local hash cache, the hash is sent to the management node; in response to receiving the hash of the respective data segment at the management node from the first computing node, performing a secondary deduplication task by the management node, the secondary deduplication task performed by the management node comprising: looking up the hash in a global hash cache of the management node, the global hash cache storing hashes of data in a backup storage device, wherein looking up the hash includes accessing the global hash cache from the management node; and in response to missing the hash in the global hash cache of the management node, adding the hash into the global hash cache, and sending to the first computing node an indication to store the respective data segment in the backup storage device. 2. The method of claim 1 , further comprising: in response to (1) receiving, at the management node from a second computing node from among the at least one computing node, a hash of another data segment from among the plurality of data segments and (2) hitting the hash received from the second computing node in the global hash cache, sending, by the management node to the second computing node, an indication to discard the other data segment. 3. The method of claim 1 , wherein distributing the plurality of data segments to the at least one computing node further comprises: determining validity of the first computing node; and in response to the first computing node being valid, sending the plurality of data segments to the first computing node. 4. The method of claim 1 , further comprising: in response to receiving a second hash of a second data segment from a second computing node from among the at least one computing node, looking up the second hash in the global hash cache of the management node, the second hash uniquely identifying the second data segment being generated by the second computing node; in response to missing the second hash in the global hash cache of the management node, adding the second hash into the global hash cache, and sending to the second computing node a second indication to store the second data segment in the backup storage device. 5. The method of claim 4 , further comprising: in response to receiving a third hash of a third data segment from the first computing node, looking up the third hash in the global hash cache of the management node, the third hash uniquely identifying the third data segment being generated by the first computing node; in response to hitting the third hash in the global hash cache of the management node, avoiding adding the third hash into the global hash cache; and sending to the first computing node a third indication to discard the third data segment. 6. The method of claim 1 wherein each of the backup server, the management node, the first computing node, and the backup storage device is included in a scale-out data backup architecture, and the method further comprises: adding, to the scale-out data backup architecture, a second computing node from among the at least one computing node and a second backup storage device. 7. The method of claim 6 wherein in response to distributing the plurality of data segments to at least one computing node, a second primary deduplication task is performed by the second computing node, and wherein during performance of the second primary deduplication task by the second computing node, a second hash uniquely identifying a second data segment from among the plurality of data segments is calculated, the second hash is looked up in a second local hash cache of the second computing node, and in response to the second hash being missed in the second local hash cache, the second hash is sent to the management node. 8. The method of claim 7 further comprising: in response to receiving the second hash of the second data segment at the management node from the second computing node, performing a second secondary deduplication task by the management node, the second secondary deduplication task performed by the management node comprising: looking up the second hash in the global hash cache of the management node; and in response to missing the second hash in the global hash cache of the management node, adding the second hash into the global hash cache, and sending to the second computing node an indication to store the second data segment in the second backup storage device. 9. A method of backing up data, comprising: in response to receiving, from a management node at a first computing node from among at least one computing node, a data segment from a data stream, performing a primary deduplication task by the first computing node, the data stream being received at the management node from a backup server, a probability of deduplication of the plurality of data segments being increased based on a pre-binding of the backup server to the first computing node, the pre-binding being determined by the management node of the backup server to the first computing node, the plurality of data segments being distributed to the first computing node, the primary deduplication task being performed by the first computing node in response to the plurality of data segments being distributed to the first computing node, and the primary deduplication task performed by the first computing node comprising: calculating a hash uniquely identifying the data segment; looking up the hash in a local hash cache of the first computing node, the local hash cache storing hashes of data in a local backup storage device; and in response to missing the hash in the local hash cache, sending the hash to the management node, wherein in response to the hash of the data segment being received at the management node from the first computing node, a secondary deduplication task is performed by the management node, and wherein during performance of the secondary deduplication task by the management node, the hash is looked up in a global hash cache of the management node, hashes of data in the backup storage device being stored in the global hash cache, and in response to the hash being missed in the global hash cache of the management node, the hash is added into the global hash cache, and an indication to store the data segment in the backup storage device is sent to the first computing node; in response to receiving, at the firs

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06F16/27Primary
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
G06F11/1464
for networked environments · CPC title
G06F16/2255
Hash tables · CPC title
G06F16/2272
Management thereof · CPC title
H04L67/1097Primary
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

View patent family 66243915

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11954118B2 cover?: Embodiments of the present disclosure relate to method, device and computer program product for data backup. The method comprises: in response to receiving from a backup server a data stream to be backed up, dividing the data stream into a plurality of data segments; distributing the plurality of data segments to at least one computing node; in response to receiving an index of a corresponding …
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and device for deduplication

Deduplication and compression of data segments in a data storage system

Performing block deduplication using block sequence classifications

Method for replicating data in a backup storage system using a cost function

Client-side deduplication with local chunk caching

Asynchronous backend global deduplication

Frequently asked questions