Fast deduplication data verification
US-2016306820-A1 · Oct 20, 2016 · US
US9753955B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9753955-B2 |
| Application number | US-201414488139-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 16, 2014 |
| Priority date | Sep 16, 2014 |
| Publication date | Sep 5, 2017 |
| Grant date | Sep 5, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An information management system provides a data deduplication system that uses a primary table, a deduplication chunk table, and a chunk integrity table to ensure that a referenced deduplicated data block is only verified once during the data verification of a backup or other replication operation. The data deduplication system may reduce the computational and storage overhead associated with traditional data verification processes. The primary table, the deduplication chunk table, and the chunk integrity table, all of which are stored in a deduplication database, can also ensure synchronization between the deduplication database and secondary storage devices.
Opening claim text (preview).
What is claimed is: 1. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising: a data storage computer comprising computer hardware configured to: retrieve, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 2. The networked information management system of claim 1 , wherein the computer hardware is configured to generate, for the first data chunk identified in the primary table, the first value by summing the primary identifications of the data blocks that are associated with the first data chunk. 3. The networked information management system of claim 1 , wherein the deduplication database is stored in a deduplication database server. 4. The networked information management system of claim 1 , wherein the computer hardware of the data storage computer is further configured to store the deduplication chunk table in the deduplication database. 5. The networked information management system of claim 1 , wherein the computer hardware of the data storage computer is further configured to retrieve the primary table in response to a request to verify data in a backup. 6. The networked information management system of claim 5 , wherein the computer hardware of the data storage computer is further configured to delete the deduplication chunk table in response to a notification that verification of the data in the backup is complete. 7. A computer-implemented method for verifying synchronization of deduplication information, the computer-implemented method comprising: retrieving, from an electronically stored deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generating, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generating, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; storing, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; storing, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; storing, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and comparing, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 8. The computer-implemented method of claim 7 , wherein generating, for the first data chunk identified in the primary table, the first value comprises summing the primary identifications of the data blocks that are associated with the first data chunk. 9. The computer-implemented method of claim 7 , wherein the deduplication database is stored in a deduplication database server. 10. The computer-implemented method of claim 7 , further comprising storing the deduplication chunk table in the deduplication database. 11. The computer-implemented method of claim 7 , further comprising: receiving a request to verify data in a backup; and retrieving the primary table in response to receiving the request to verify the data in the backup. 12. The computer-implemented method of claim 11 , further comprising deleting the deduplication chunk table in response to a notification that verification of the data in the backup is complete. 13. A networked information management system configured to verify synchronization of deduplication information, the networked information management system comprising: a storage manager comprising computer hardware configured to receive a request to verify data in a backup; a deduplication database media agent comprising an electronically stored deduplication database and computer hardware configured to: retrieve, from the deduplication database, a primary table, wherein the primary table identifies data blocks stored in a secondary storage device and data chunks associated with the data blocks, and wherein the primary table comprises a primary identification for each identified data block; generate, for a first data chunk of the data chunks identified in the primary table, a first value based on the primary identifications of the identified data blocks; generate, for the first data chunk identified in the primary table, a second value by squaring the primary identifications of the identified data blocks that are associated with the first data chunk and summing the squared primary identifications; store, for the first data chunk identified in the primary table, an identification of the first data chunk in a deduplication chunk table; store, for the first data chunk identified in the primary table, the first value associated with the first data chunk in the deduplication chunk table; store, for the first data chunk identified in the primary table, the second value associated with the first data chunk in the deduplication chunk table; and compare, for the first data chunk identified in the deduplication chunk table, the stored first value and the stored second value with values derived from an instance file corresponding to the first data chunk to verify that information stored in the primary table and information stored in the secondary storage device is synchronized. 14. The networked information management system of claim 13 , wherein the computer hardware is configured to generate, for the first data chunk identified in the primary table, the first value by summing the primary identifications of the data blo
Error detection; Error correction; Monitoring (error detection, correction or monitoring in information storage based on relative movement between record carrier and transducer G11B20/18; monitoring, i.e. supervising the progress of recording or reproducing G11B27/36; in static stores G11C29/00) · CPC title
using de-duplication of the data · CPC title
Management of the backup or restore process · CPC title
Techniques for file synchronisation in file systems · CPC title
De-duplication techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.