Deduplicating snapshots associated with a backup operation

US10162555B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10162555-B2
Application numberUS-201815971675-A
CountryUS
Kind codeB2
Filing dateMay 4, 2018
Priority dateJun 13, 2014
Publication dateDec 25, 2018
Grant dateDec 25, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Deduplicating snapshot associated with a backup operation is disclosed, including: performing a backup operation including by generating a plurality of snapshots; maintaining, at a source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation; and using the deduplication data to deduplicate backup data across the plurality of snapshots.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor configured to: receive an indication to perform a backup operation on a plurality of storage areas of a source system; in response to the indication, perform the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation; maintain, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to the backup media; and use the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein to use the deduplication data comprises to compare, at the source system, an identifier associated with a data block to back up in a first snapshot included in the plurality of snapshots to the plurality of identifiers, wherein in response to a determination that a matching identifier is not found in the plurality of identifiers: determine, at the source system, that the data block has not already been written to the backup media; send, from the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot at the backup media at the backup storage; and send, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs; and a memory coupled to the processor and configured to store the deduplication data. 2. The system of claim 1 , wherein the plurality of snapshots is configured to be stored at the backup media. 3. The system of claim 1 , wherein a storage area included in the plurality of storage areas comprises a volume of storage. 4. The system of claim 1 , wherein the identifier associated with the data block comprises a disk block number. 5. The system of claim 1 , wherein the backup operation comprises a full backup. 6. The system of claim 1 , wherein the backup operation comprises an incremental backup. 7. The system of claim 1 , wherein the deduplication data is configured to be deleted subsequent to completion of the backup operation. 8. The system of claim 1 , wherein the processor is further configured to restore the first snapshot, including by reading the entry associated with the data block included in the first snapshot: in response to a first determination that the entry includes underlying data of the data block: restore stored data associated with the entry associated with the data block to the source system; and use the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs; and in response to a second determination that the entry associated with the data block includes a representation of the data block: use the representation included in the entry to locate a location at the backup media to restore data stored at the location at the backup media to the source system; and use the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs. 9. The system of claim 1 , wherein the determination comprises a first determination and wherein, in response to a second determination that the matching identifier is included in the deduplication data, the processor is further configured to: determine, at the source system, that the data block has already been written to the backup media; and send, from the source system, to the backup storage a representation of the data block to be stored as the entry associated with the data block in the first snapshot on the backup media at the backup storage, wherein the representation of the data block comprises associating data to a location at the backup media to which the data block was previously written, wherein the representation of the data block is determined based at least in part on information stored in the deduplication data, wherein the data block was previously written to the location at the backup media for a second snapshot of the plurality of snapshots. 10. The system of claim 9 , wherein the representation of the data block comprises at least one of a hard link or a soft link. 11. The system of claim 1 , wherein the metadata block comprises an inode. 12. A method, comprising: receiving an indication to perform a backup operation on a plurality of storage areas of a source system; in response to the indication, performing the backup operation including by generating a plurality of snapshots corresponding to respective ones of the plurality of storage areas associated with the backup operation; maintaining, at the source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation, wherein the deduplication data comprises a plurality of identifiers corresponding to respective ones of data blocks that have already been written to the backup media; and using the deduplication data, at the source system, to deduplicate backup data across the plurality of snapshots associated with the backup operation, wherein to use the deduplication data comprises to compare, at the source system, an identifier associated with a data block to back up in a first snapshot included in the plurality of snapshots to the plurality of identifiers, wherein in response to a determination that a matching identifier is not found in the plurality of identifiers: determining, at the source system, that the data block has not already been written to the backup media; sending, from the source system, to a backup storage underlying data of the data block to be stored as an entry associated with the data block in the first snapshot at the backup media at the backup storage; and sending, from the source system, to the backup storage a metadata block corresponding to the data block, wherein the metadata block is to be stored in the first snapshot at the backup media at the backup storage, wherein the metadata block is configured to be used to determine to which file or directory, or both, the data block belongs. 13. The method of claim 12 , wherein the identifier associated with the data block comprises a disk block number. 14. The method of claim 12 , wherein the deduplication data is configured to be deleted subsequent to completion of the backup operation. 15. The method of claim 12 , further comprising restoring the first snapshot, including by reading the entry associated with the data block included in the first snapshot: in response to a first determination that the entry includes underlying data of the data block: restoring stored data associated with the entry associated with the data block to the source system; and using the stored metadata block corresponding to the data block to determine to which file, directory, or both the data block belongs; and in response to a second determination that the entry associated with the data block includes a representation of the data block: using the representation included in the entry to locate a location at the backup media to restore data stored at the location at the backup me

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10162555B2 cover?
Deduplicating snapshot associated with a backup operation is disclosed, including: performing a backup operation including by generating a plurality of snapshots; maintaining, at a source system, deduplication data corresponding to one or more data blocks that have already been written to backup media during the backup operation; and using the deduplication data to deduplicate backup data acros…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 25 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).