Selective deduplication

US2016267098A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016267098-A1
Application numberUS-201615162496-A
CountryUS
Kind codeA1
Filing dateMay 23, 2016
Priority dateOct 18, 2012
Publication dateSep 15, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.

First claim

Opening claim text (preview).

1 . A method comprising: determining, by a storage server, a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit; determining a second deduplication priority for a second data object based upon a second projected likelihood that deduplication of the second data object will provide the storage space benefit; and responsive to the first deduplication priority exceeding the second deduplication priority, performing inline deduplication for the first data object but not the second data object. 2 . The method of claim 1 , comprising: responsive to the second deduplication priority exceeding the first deduplication priority, performing inline deduplication for the second data object but not the first data object. 3 . The method of claim 1 , comprising: responsive to the first deduplication priority exceeding the second deduplication priority, performing post-processing deduplication for the second data object. 4 . The method of claim 2 , comprising: responsive to the second deduplication priority exceeding the first deduplication priority, performing post-processing deduplication for the first data object. 5 . The method of claim 1 , wherein the performing inline deduplication comprises: storing the first data object within a temporary storage location; and performing the inline deduplication upon the first data object to create a deduplicated first data object that is stored into persistent storage. 6 . The method of claim 1 , wherein the performing inline deduplication comprises: responsive to the first deduplication priority exceeding a deduplication probability threshold, performing the inline deduplication for the first data object. 7 . The method of claim 6 , wherein the performing inline deduplication comprises: responsive to the first deduplication priority not exceeding the deduplication probability threshold, performing post-processing deduplication, and not the inline deduplication, for the first data object. 8 . The method of claim 3 , wherein the performing post-processing deduplication comprises: storing the second data object to persistent storage; and performing the post-processing deduplication upon the second data object while stored within the persistent storage. 9 . The method of claim 3 , wherein the performing post-processing deduplication comprises: responsive to determining that a current system load demand is below a threshold, performing the post-processing deduplication. 10 . The method of claim 3 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication as a background operation. 11 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority before the first data object is stored within persistent storage. 12 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a size characteristic of the first data object. 13 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a data object type of the first data object. 14 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a last modified timestamp of the first data object. 15 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon an update frequency of the first data object. 16 . The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon statistic information derived from a previous deduplication operation. 17 . The method of claim 6 , comprising: determining the deduplication probability threshold based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic. 18 . The method of claim 1 , comprising: responsive to a number of data objects, not yet deduplicated, exceeding a threshold, triggering a comparison of deduplication priorities of data objects for selectively performing inline deduplication. 19 . A computing device comprising: a memory containing machine-readable storage media having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: determine a first deduplication probability threshold and a second deduplication probability threshold based upon a performance characteristic; creating a multi-level deduplication probability comprising a hierarchical relationship between the first deduplication probability threshold and the second deduplication probability threshold; utilizing the multi-level deduplication probability to assign deduplication priorities to data objects; and selectively performing inline deduplication or post-processing deduplication for the data objects based upon the deduplication priorities. 20 . A non-transitory machine-readable storage media having stored thereon instructions, for performing a method, which causes a computing device to: determine a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit; determine a second deduplication priority for a second data object based upon a second projected likelihood that deduplication of the second data object will provide the storage space benefit; responsive to the first deduplication priority exceeding the second deduplication priority, perform inline deduplication for the first data object and post-processing deduplication for the second data object; and responsive to the second deduplication priority exceeding the first deduplication priority, performing inline deduplication for the second data object and post-processing deduplication for the first data object.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • De-duplication techniques · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016267098A1 cover?
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage sy…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 15 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).