Direct lookup for identifying duplicate data in a data deduplication system

US9817865B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9817865-B2
Application numberUS-201514960982-A
CountryUS
Kind codeB2
Filing dateDec 7, 2015
Priority dateDec 7, 2015
Publication dateNov 14, 2017
Grant dateNov 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the direct inter-region fingerprint lookup supplementing a central fingerprint index.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for identifying data in a data deduplication system, by a processor device, comprising: identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicating the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 2. The method of claim 1 , further including establishing an active owners list for each of the at least one of the plurality of metadata regions; wherein the active owners list comprises a list of each metadata region in which a reference has been created by way of the fingerprint matches. 3. The method of claim 2 , further including, for new data writes to the at least one of the plurality of metadata regions, searching for the fingerprint matches within each metadata region established on the active owners list. 4. The method of claim 3 , further including searching for the fingerprint matches in the central metadata fingerprint index if a match is not found within each metadata region established on the active owners list. 5. The method of claim 2 , further including adjoining a metadata region to the active owners list based upon one of a predetermined memory consumption threshold and a central processing unit (CPU) consumption threshold the metadata region will occupy. 6. The method of claim 2 , further including defining a fingerprint lookup threshold; wherein a quantity of unsuccessful attempts to locate the fingerprint matches in metadata regions contained on the owners list triggers a central fingerprint index lookup. 7. The method of claim 2 , further including evicting a metadata region from the owners list based upon a predetermined threshold of unsuccessful fingerprint matches, pursuant to an eviction policy. 8. The method of claim 7 , further including establishing one of a most frequently used, most recently used, least frequently used, and least recently used list developed pursuant to the eviction policy. 9. A system for identifying data in a data deduplication system, the system comprising: at least one processor device, wherein the processor device: identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 10. The system of claim 9 , wherein the at least one processor device establishes an active owners list for each of the at least one of the plurality of metadata regions; wherein the active owners list comprises a list of each metadata region in which a reference has been created by way of the fingerprint matches. 11. The system of claim 10 , wherein the at least one processor device, for new data writes to the at least one of the plurality of metadata regions, searches for the fingerprint matches within each metadata region established on the active owners list. 12. The system of claim 11 , wherein the at least one processor device searches for the fingerprint matches in the central metadata fingerprint index if a match is not found within each metadata region established on the active owners list. 13. The system of claim 10 , wherein the at least one processor device adjoins a metadata region to the active owners list based upon one of a predetermined memory consumption threshold and a central processing unit (CPU) consumption threshold the metadata region will occupy. 14. The system of claim 10 , wherein the at least one processor device defines a fingerprint lookup threshold; wherein a quantity of unsuccessful attempts to locate the fingerprint matches in metadata regions contained on the owners list triggers a central fingerprint index lookup. 15. The system of claim 10 , wherein the at least one processor device evicts a metadata region from the owners list based upon a predetermined threshold of unsuccessful fingerprint matches, pursuant to an eviction policy. 16. The system of claim 15 , wherein the at least one processor device establishes one of a most frequently used, most recently used, least frequently used, and least recently used list developed pursuant to the eviction policy. 17. A computer program product for identifying data in a data deduplication system, by a processor device, the computer program product embodied on a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that identifies duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the metadata regions each comprising a certain area of user space swapped in and out of memory and containing fingerprints for all data chunks written to the certain area of user space, wherein, for data writes to a given one of the metadata regions, the direct inter-region fingerprint lookup first searches an index of the fingerprints within the given one of the metadata regions, and subsequently searches a separate yet supplemental central fingerprint index if the fingerprint matches are not found within the given one of the metadata regions, the central fingerprint index indicating in which of the plurality of metadata regions the fingerprints reside; and an executable portion that deduplicates the identified duplicate data using the identified fingerprint matches from at least one of the index within the given one of the metadata regions and the central fingerprint index. 18. The computer program product of claim 17 , further including an executable portion that establishes an active owners list for each of the at least one of the plurality of metadata regions; wherein the active

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Aggregation; Duplicate elimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9817865B2 cover?
Various embodiments for identifying data in a data deduplication system, by a processor device, are provided. In one embodiment, a method comprises efficiently identifying duplicate data in the data deduplication system by identifying fingerprint matches using a direct inter-region fingerprint lookup to search for the fingerprint matches in at least one of a plurality of metadata regions, the d…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30489. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).