Data de-duplication for information storage systems
US-8954399-B1 · Feb 10, 2015 · US
US9524104B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9524104-B2 |
| Application number | US-201514589218-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 5, 2015 |
| Priority date | Apr 18, 2011 |
| Publication date | Dec 20, 2016 |
| Grant date | Dec 20, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies for eliminating duplicate data within a storage system. De-duplication may be performed done at physical chunk level, where the data is not copied or moved to different location. A logical mapping is modified using a thin de-duplication kernel module that resides between a distributed volume manager (DVM) and a logical disk (LD). De-duplication is achieved by changing pointers in the mapping to land at a physical location. De-duplication is performed as post-process feature where duplicates are identified and the duplicates are marked in the mapping table, thereby claiming free space through de-duplication. Block-level de-duplication in accordance with the above can co-exist with existing storage architectures for thin provisioning and snapshot management.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for identifying candidates for data de-duplication in a data storage system, comprising: maintaining, using a volume management module, a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of the data storage system; maintaining, using the volume management module, a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and identifying, using a data de-duplication module, the candidates for data de-duplication based on the timestamps and the snapshot volume table maintained by the volume management module. 2. The computer-implemented method of claim 1 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 3. The computer-implemented method of claim 2 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 4. The computer-implemented method of claim 1 , further comprising analyzing the candidates for data de-duplication to find duplicated chunks. 5. The computer-implemented method of claim 4 , further comprising performing a data de-duplication operation on the duplicated chunks. 6. The computer-implemented method of claim 2 , further comprising creating a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation. 7. The computer-implemented method of claim 6 , wherein the de-duplication chunk bitmap is encoded in a de-duplication table. 8. The computer-implemented method of claim 1 , wherein each of the timestamps indicates a time at which a last write I/O operation was performed on a territory of the at least one volume. 9. A data storage system, comprising: a storage server; a physical storage device associated with the storage server; a processing unit associated with the storage server; a volume management module for execution on the processing unit, the volume management module being operable to: maintain a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of the data storage system, and maintain a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and a data de-duplication module for execution on the processing unit, the data de-duplication module being operable to identify the candidates for data de-duplication based on the timestamps and the snapshot volume table maintained by the volume management module. 10. The data storage system of claim 9 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 11. The data storage system of claim 10 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 12. The data storage system of claim 9 , wherein the data de-duplication module is further operable to analyze the candidates for data de-duplication to find duplicated chunks. 13. The data storage system of claim 12 , wherein the data de-duplication module is further operable to perform a data de-duplication operation on the duplicated chunks. 14. The data storage system of claim 10 , wherein the data de-duplication module is further operable to create a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation. 15. A non-transitory computer-readable storage medium having computer-executable instructions stored thereon for identifying candidates for data de-duplication which, when executed by a computer system, cause the computer system to: maintain a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of a data storage system; maintain a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and identifying the candidates for data de-duplication based on the timestamps and the snapshot volume table. 16. The non-transitory computer-readable storage medium of claim 15 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 17. The non-transitory computer-readable storage medium of claim 16 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 18. The non-transitory computer-readable storage medium of claim 15 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to analyze the candidates for data de-duplication to find duplicated chunks. 19. The non-transitory computer-readable storage medium of claim 18 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to perform a data de-duplication operation on the duplicated chunks. 20. The non-transitory computer-readable storage medium of claim 16 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to create a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation.
De-duplication techniques · CPC title
Disk arrays, e.g. RAID, JBOD · CPC title
Resetting or repowering · CPC title
using de-duplication of the data · CPC title
at area level, e.g. provisioning of virtual or logical volumes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.