Data de-duplication for information storage systems

US9524104B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9524104-B2
Application numberUS-201514589218-A
CountryUS
Kind codeB2
Filing dateJan 5, 2015
Priority dateApr 18, 2011
Publication dateDec 20, 2016
Grant dateDec 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies for eliminating duplicate data within a storage system. De-duplication may be performed done at physical chunk level, where the data is not copied or moved to different location. A logical mapping is modified using a thin de-duplication kernel module that resides between a distributed volume manager (DVM) and a logical disk (LD). De-duplication is achieved by changing pointers in the mapping to land at a physical location. De-duplication is performed as post-process feature where duplicates are identified and the duplicates are marked in the mapping table, thereby claiming free space through de-duplication. Block-level de-duplication in accordance with the above can co-exist with existing storage architectures for thin provisioning and snapshot management.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for identifying candidates for data de-duplication in a data storage system, comprising: maintaining, using a volume management module, a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of the data storage system; maintaining, using the volume management module, a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and identifying, using a data de-duplication module, the candidates for data de-duplication based on the timestamps and the snapshot volume table maintained by the volume management module. 2. The computer-implemented method of claim 1 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 3. The computer-implemented method of claim 2 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 4. The computer-implemented method of claim 1 , further comprising analyzing the candidates for data de-duplication to find duplicated chunks. 5. The computer-implemented method of claim 4 , further comprising performing a data de-duplication operation on the duplicated chunks. 6. The computer-implemented method of claim 2 , further comprising creating a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation. 7. The computer-implemented method of claim 6 , wherein the de-duplication chunk bitmap is encoded in a de-duplication table. 8. The computer-implemented method of claim 1 , wherein each of the timestamps indicates a time at which a last write I/O operation was performed on a territory of the at least one volume. 9. A data storage system, comprising: a storage server; a physical storage device associated with the storage server; a processing unit associated with the storage server; a volume management module for execution on the processing unit, the volume management module being operable to: maintain a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of the data storage system, and maintain a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and a data de-duplication module for execution on the processing unit, the data de-duplication module being operable to identify the candidates for data de-duplication based on the timestamps and the snapshot volume table maintained by the volume management module. 10. The data storage system of claim 9 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 11. The data storage system of claim 10 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 12. The data storage system of claim 9 , wherein the data de-duplication module is further operable to analyze the candidates for data de-duplication to find duplicated chunks. 13. The data storage system of claim 12 , wherein the data de-duplication module is further operable to perform a data de-duplication operation on the duplicated chunks. 14. The data storage system of claim 10 , wherein the data de-duplication module is further operable to create a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation. 15. A non-transitory computer-readable storage medium having computer-executable instructions stored thereon for identifying candidates for data de-duplication which, when executed by a computer system, cause the computer system to: maintain a timestamp for each of a plurality of write I/O operations, each of the write I/O operations being directed to a territory of at least one volume of a data storage system; maintain a snapshot volume table including a bitmap for tracking differences in data of the at least one volume between snapshots at provision-level granularity; and identifying the candidates for data de-duplication based on the timestamps and the snapshot volume table. 16. The non-transitory computer-readable storage medium of claim 15 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last data de-deduplication operation. 17. The non-transitory computer-readable storage medium of claim 16 , wherein identifying the candidates for data de-duplication further comprises identifying one or more chunks that were modified after a last snapshot operation. 18. The non-transitory computer-readable storage medium of claim 15 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to analyze the candidates for data de-duplication to find duplicated chunks. 19. The non-transitory computer-readable storage medium of claim 18 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to perform a data de-duplication operation on the duplicated chunks. 20. The non-transitory computer-readable storage medium of claim 16 , having further computer-executable instructions stored thereon which, when executed by a computer system, cause the computer system to create a de-duplication chunk bitmap for tracking the one or more chunks that were modified after the last data de-deduplication operation.

Assignees

Inventors

Classifications

  • De-duplication techniques · CPC title

  • Disk arrays, e.g. RAID, JBOD · CPC title

  • Resetting or repowering · CPC title

  • using de-duplication of the data · CPC title

  • at area level, e.g. provisioning of virtual or logical volumes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9524104B2 cover?
Technologies for eliminating duplicate data within a storage system. De-duplication may be performed done at physical chunk level, where the data is not copied or moved to different location. A logical mapping is modified using a thin de-duplication kernel module that resides between a distributed volume manager (DVM) and a logical disk (LD). De-duplication is achieved by changing pointers in t…
Who is the assignee on this patent?
American Megatrends Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).