Selective deduplication

US10565165B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10565165-B2
Application numberUS-201715693685-A
CountryUS
Kind codeB2
Filing dateSep 1, 2017
Priority dateOct 18, 2012
Publication dateFeb 18, 2020
Grant dateFeb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses for performing selective deduplication in a storage system are introduced herein. Techniques are provided for determining a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device. Inline deduplication is performed for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold that is indicative of inline deduplication of the first data object having a threshold likelihood of achieving the storage space benefit from inline deduplication.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: determining, by a processor of a computing device, a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and performing inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics. 2. The method of claim 1 , comprising: determining a second deduplication priority for a second data object based upon a second projected likelihood that deduplication of the second data object will provide the storage space benefit. 3. The method of claim 1 , comprising: performing inline deduplication for the second data object based upon the second deduplication priority exceeding the deduplication probability threshold. 4. The method of claim 3 , comprising: performing post-processing deduplication for the second data object based upon the second deduplication priority being less than the deduplication probability threshold. 5. The method of claim 1 , comprising: performing post-processing deduplication for the first data object based upon the first deduplication priority being less than the deduplication probability threshold. 6. The method of claim 1 , wherein the performing inline deduplication comprises: performing the inline deduplication upon the first data object stored within a temporary storage location to create a deduplicated first data object that is subsequently stored into persistent storage. 7. The method of claim 4 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication upon the second data object while stored within persistent storage. 8. The method of claim 7 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication based upon a determination that a current system load demand is below a threshold. 9. The method of claim 4 , wherein the performing post-processing deduplication comprises: performing the post-processing deduplication as a background operation. 10. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority before the first data object is stored within persistent storage. 11. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a size characteristic of the first data object. 12. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a data object type of the first data object. 13. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon a last modified timestamp of the first data object. 14. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon an update frequency of the first data object. 15. The method of claim 1 , wherein the determining a first deduplication priority comprises: determining the first deduplication priority based upon statistic information derived from a previous deduplication operation. 16. The method of claim 1 , comprising: determining the deduplication probability threshold based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic. 17. A computing device comprising: a memory having stored thereon instructions for performing a method; and a processor coupled to the memory, the processor configured to execute the instructions to cause the processor to: determine a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and perform inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics. 18. The computing device of claim 17 , wherein the instructions cause the processor to: determine the deduplication probability threshold based upon at least one a read throughput, a write throughput, a read response time, a write response time, network utilization, and a performance characteristic. 19. The computing device of claim 17 , wherein the instructions cause the processor to: determine the first deduplication priority based upon an update frequency of the first data object. 20. A non-transitory machine-readable storage media having stored thereon instructions, for performing a method, which causes a computing device to: determine a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device, wherein the first projected likelihood is derived from aggregated statistical information indicating whether the first data object has characteristics that previously provided deduplication benefits; and perform inline deduplication for the first data object based upon the first deduplication priority exceeding a deduplication probability threshold corresponding to a likelihood of achieving the storage space benefit from inline deduplication, wherein the deduplication probability threshold is determined based upon predetermined performance metrics.

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • De-duplication techniques · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10565165B2 cover?
Methods and apparatuses for performing selective deduplication in a storage system are introduced herein. Techniques are provided for determining a first deduplication priority for a first data object based upon a first projected likelihood that deduplication of the first data object will provide a storage space benefit to reduce storage consumption of a storage device. Inline deduplication is …
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).