Selective deduplication

US2020125536A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020125536-A1
Application numberUS-201916716759-A
CountryUS
Kind codeA1
Filing dateDec 17, 2019
Priority dateOct 18, 2012
Publication dateApr 23, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.

First claim

Opening claim text (preview).

1 . A method, comprising: evaluating characteristics of a data object to determine a deduplication priority of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device; and performing post-processing deduplication for the data object based upon the deduplication priority being less than a deduplication probability threshold. 2 . The method of claim 1 , comprising: performing the post-processing deduplication subsequent to the data object being stored within the storage device. 3 . The method of claim 1 , comprising: determining the deduplication priority based upon whether aggregated statistical information indicates that deduplication of data objects, having the characteristics of the data object, previously provided deduplication benefits. 4 . The method of claim 1 , comprising: defining the deduplication probability threshold based upon predetermined performance metrics. 5 . The method of claim 1 , comprising: performing in-line deduplication for data objects before the data objects are stored within the storage device based upon the data objects having deduplication priorities greater than the deduplication probability threshold. 6 . The method of claim 1 , comprising: performing the post-processing deduplication based upon a determination that a current system load demand is below a threshold. 7 . The method of claim 1 , comprising: performing the post-processing deduplication as a background operation. 8 . A computing device comprising: a memory having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: evaluate characteristics of a data object to determine a deduplication priority of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device; and determining a deduplication probability threshold based upon performance metrics; and perform post-processing deduplication for the data object based upon the deduplication priority being less than the deduplication probability. 9 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon a size characteristic of the data object. 10 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon a data object type of the data object. 11 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon a last modified timestamp of the data object. 12 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon an update frequency of the data object. 13 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon statistic information derived from a previous deduplication operation. 14 . The computing device of claim 8 , wherein the instructions cause the processor to: determine the deduplication priority based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic. 15 . A non-transitory machine-readable storage media having stored thereon instructions, for performing a method, which causes a processor to: evaluate characteristics of a data object against aggregated statistical information of whether deduplication of data objects with the characteristics resulted in storage consumption savings to determine a deduplication priority of the data object based upon a projected likelihood that deduplication of the data object will reduce storage consumption within a storage device; and perform post-processing deduplication for the data object based upon the deduplication priority being less than a deduplication probability threshold. 16 . The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication priority based upon a data object type of the data object. 17 . The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication priority based upon a last modified timestamp of the data object. 18 . The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication priority based upon an update frequency of the data object. 19 . The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication priority based upon statistic information derived from a previous deduplication operation. 20 . The non-transitory machine-readable storage media of claim 15 , wherein the instructions cause the processor to: determine the deduplication priority based upon a size characteristic of the data object.

Assignees

Inventors

Classifications

  • De-duplication techniques · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020125536A1 cover?
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage sy…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 23 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).