Memory efficient sanitization of a deduplicated storage system
US-9430164-B1 · Aug 30, 2016 · US
US9817865B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9817865-B2 |
| Application number | US-201514960982-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 7, 2015 |
| Priority date | Dec 7, 2015 |
| Publication date | Nov 14, 2017 |
| Grant date | Nov 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.
Opening claim text (preview).
The invention claimed is: 1. A method for identifying data in a data deduplication system, by a processor device, comprising: identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicating the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 2. The method of claim 1 , further including establishing an active owners list for each of the at least one of the plurality of metadata regions; wherein the active owners list comprises a list of each metadata region in which a reference has been created by way of the fingerprint matches. 3. The method of claim 2 , further including, for new data writes to the at least one of the plurality of metadata regions, searching for the fingerprint matches within each metadata region established on the active owners list. 4. The method of claim 3 , further including searching for the fingerprint matches in the central metadata fingerprint index if a match is not found within each metadata region established on the active owners list. 5. The method of claim 2 , further including adjoining a metadata region to the active owners list based upon one of a predetermined memory consumption threshold and a central processing unit (CPU) consumption threshold the metadata region will occupy. 6. The method of claim 2 , further including defining a fingerprint lookup threshold; wherein a quantity of unsuccessful attempts to locate the fingerprint matches in metadata regions contained on the owners list triggers a central fingerprint index lookup. 7. The method of claim 2 , further including evicting a metadata region from the owners list based upon a predetermined threshold of unsuccessful fingerprint matches, pursuant to an eviction policy. 8. The method of claim 7 , further including establishing one of a most frequently used, most recently used, least frequently used, and least recently used list developed pursuant to the eviction policy. 9. A system for identifying data in a data deduplication system, the system comprising: at least one processor device, wherein the processor device: identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 10. The system of claim 9 , wherein the at least one processor device establishes an active owners list for each of the at least one of the plurality of metadata regions; wherein the active owners list comprises a list of each metadata region in which a reference has been created by way of the fingerprint matches. 11. The system of claim 10 , wherein the at least one processor device, for new data writes to the at least one of the plurality of metadata regions, searches for the fingerprint matches within each metadata region established on the active owners list. 12. The system of claim 11 , wherein the at least one processor device searches for the fingerprint matches in the central metadata fingerprint index if a match is not found within each metadata region established on the active owners list. 13. The system of claim 10 , wherein the at least one processor device adjoins a metadata region to the active owners list based upon one of a predetermined memory consumption threshold and a central processing unit (CPU) consumption threshold the metadata region will occupy. 14. The system of claim 10 , wherein the at least one processor device defines a fingerprint lookup threshold; wherein a quantity of unsuccessful attempts to locate the fingerprint matches in metadata regions contained on the owners list triggers a central fingerprint index lookup. 15. The system of claim 10 , wherein the at least one processor device evicts a metadata region from the owners list based upon a predetermined threshold of unsuccessful fingerprint matches, pursuant to an eviction policy. 16. The system of claim 15 , wherein the at least one processor device establishes one of a most frequently used, most recently used, least frequently used, and least recently used list developed pursuant to the eviction policy. 17. A computer program product for identifying data in a data deduplication system, by a processor device, the computer program product embodied on a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and an executable portion that deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 18. The computer program product of claim 17 , further including an executable portion that establishes an active owners list for each of the at least one of the plurality of metadata regions; wherein the active
Physics · mapped topic
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Aggregation; Duplicate elimination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.