Fast deduplication data verification

US9753955B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9753955-B2
Application numberUS-201414488139-A
CountryUS
Kind codeB2
Filing dateSep 16, 2014
Priority dateSep 16, 2014
Publication dateSep 5, 2017
Grant dateSep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.

First claim

Opening claim text (preview).

What is claimed is: 1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising: a data storage computer comprising computer hardware configured to: retrieve, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 2. The networked information management system of claim 1 , wherein the computer hardware is configured to generate, for the first data chunk identified in the primary table, the first value by summing the primary identifications of the data blocks that are associated with the first data chunk. 3. The networked information management system of claim 1 , wherein the deduplication database is stored in a deduplication database server. 4. The networked information management system of claim 1 , wherein the computer hardware of the data storage computer is further configured to store the deduplication chunk table in the deduplication database. 5. The networked information management system of claim 1 , wherein the computer hardware of the data storage computer is further configured to retrieve the primary table in response to a request to verify data in a backup. 6. The networked information management system of claim 5 , wherein the computer hardware of the data storage computer is further configured to delete the deduplication chunk table in response to a notification that verification of the data in the backup is complete. 7. A computer-implemented method for verifying synchronization of deduplication information, the computer-implemented method comprising: retrieving, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generating, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generating, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; storing, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; storing, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; storing, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and comparing, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 8. The computer-implemented method of claim 7 , wherein generating, for the first data chunk identified in the primary table, the first value comprises summing the primary identifications of the data blocks that are associated with the first data chunk. 9. The computer-implemented method of claim 7 , wherein the deduplication database is stored in a deduplication database server. 10. The computer-implemented method of claim 7 , further comprising storing the deduplication chunk table in the deduplication database. 11. The computer-implemented method of claim 7 , further comprising: receiving a request to verify data in a backup; and retrieving the primary table in response to receiving the request to verify the data in the backup. 12. The computer-implemented method of claim 11 , further comprising deleting the deduplication chunk table in response to a notification that verification of the data in the backup is complete. 13. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising: a storage manager comprising computer hardware configured to receive a request to verify data in a backup; a deduplication database media agent comprising an electronically stored deduplication database and computer hardware configured to: retrieve, from the deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 14. The networked information management system of claim 13 , wherein the computer hardware is configured to generate, for the first data chunk identified in the primary table, the first value by summing the primary identifications of the data blo

Assignees

Inventors

Classifications

  • Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title

  • using de-duplication of the data · CPC title

  • Management of the backup or restore process · CPC title

  • Techniques for file synchronisation in file systems · CPC title

  • De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9753955B2 cover?
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with …
Who is the assignee on this patent?
Commvault Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).