Optimization of data deduplication

US9965182B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9965182-B2
Application numberUS-201514919204-A
CountryUS
Kind codeB2
Filing dateOct 21, 2015
Priority dateOct 21, 2015
Publication dateMay 8, 2018
Grant dateMay 8, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for optimizing deduplication in a computing storage environment by a processor, comprising: intelligently forming links between data regions based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions; managing, by an owner, a popularity index incorporating the popularity statistics, in one of a memory-only location and persistent memory location, wherein, over time, the popularity index is subjected to an aging mechanism pursuant to a decaying algorithm; creating, by a potential referrer one of the data regions to an owner one of the data regions, one of the intelligently formed links by searching the popularity index; deciding, by the owner one of the data regions, to accept the one of the intelligently formed links based on at least one of a plurality of predetermined factors; wherein deciding based on the at least one of the plurality of predetermined factors includes at least one of: considering a popularity metric of the owner one of the data regions, and considering at least one self-data management characteristic of the owner one of the data regions; if the one of the intelligently formed links is accepted by the owner one of the data regions, creating, by the potential referrer one of the data regions, the one of the intelligently formed links; and if the one of the intelligently formed links is rejected by the owner one of the data regions, writing data. 2. The method of claim 1 , further including initializing a system-wide parameter describing a minimum popularity value per region to indicate a popular owner of a corresponding data region, wherein the minimum popularity value is confirmed when testing the computing storage environment. 3. A system for optimizing deduplication in a computing storage environment, comprising: at least one processor, operational in the computing storage environment, wherein the at least one processor intelligently forms links between data regions based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions; manages, by an owner, a popularity index incorporating the popularity statistics, in one of a memory-only location and persistent memory location, wherein, over time, the popularity index is subjected to an aging mechanism pursuant to a decaying algorithm; creates, by a potential referrer one of the data regions to an owner one of the data regions, one of the intelligently formed links by searching the popularity index; decides, by the owner one of the data regions, to accept the one of the intelligently formed links based on at least one of a plurality of predetermined factors; wherein deciding based on the at least one of the plurality of predetermined factors includes at least one of: considering a popularity metric of the owner one of the data regions, and considering at least one self-data management characteristic of the owner one of the data regions; if the one of the intelligently formed links is accepted by the owner one of the data regions, creates, by the potential referrer one of the data regions, the one of the intelligently formed links; and if the one of the intelligently formed links is rejected by the owner one of the data regions, writes data. 4. The system of claim 3 , wherein the at least one processor initializes a system-wide parameter describing a minimum popularity value per region to indicate a popular owner of a corresponding data region, further wherein the minimum popularity value is confirmed when testing the computing storage environment. 5. A computer program product for optimizing deduplication in a computing storage environment by a processor, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: a first executable portion that intelligently forms links between data regions based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions; a second executable portion that manages, by an owner, a popularity index incorporating the popularity statistics, in one of a memory-only location and persistent memory location, wherein, over time, the popularity index is subjected to an aging mechanism pursuant to a decaying algorithm; a third executable portion that creates, by a potential referrer one of the data regions to an owner one of the data regions, one of the intelligently formed links by searching the popularity index; a fourth executable portion that decides, by the owner one of the data regions, to accept the one of the intelligently formed links based on at least one of a plurality of predetermined factors; wherein deciding based on the at least one of the plurality of predetermined factors includes at least one of: considering a popularity metric of the owner one of the data regions, and considering at least one self-data management characteristic of the owner one of the data regions; a fifth executable portion that, if the one of the intelligently formed links is accepted by the owner one of the data regions, creates, by the potential referrer one of the data regions, the one of the intelligently formed links; and a sixth executable portion that, if the one of the intelligently formed links is rejected by the owner one of the data regions, writes data. 6. The computer program product of claim 5 , further including a seventh executable portion that initializes a system-wide parameter describing a minimum popularity value per region to indicate a popular owner of a corresponding data region, further wherein the minimum popularity value is confirmed when testing the computing storage environment.

Assignees

Inventors

Classifications

  • based on file chunks · CPC title

  • Saving storage space on storage systems · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • De-duplication techniques · CPC title

  • G06F3/06Primary

    Digital input from, or digital output to, record carriers {, e.g. RAID, emulated record carriers or networked record carriers} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9965182B2 cover?
Various embodiments for optimizing deduplication in a computing storage environment by a processor. Links between data regions are intelligently formed, based on up-to-date popularity statistics, including a number of times a particular one of the data regions was a target for a potential link with another one of the data regions.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/1752. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 08 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).