Selective deduplication

US9348538B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9348538-B2
Application numberUS-201213655263-A
CountryUS
Kind codeB2
Filing dateOct 18, 2012
Priority dateOct 18, 2012
Publication dateMay 24, 2016
Grant dateMay 24, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage system if the probability of deduplication for the data object has a specified relationship to a specified threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: determining in a storage system a probability of deduplication for a data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determining in the storage system a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the storage system and adjusted based on availability of resources of the storage system and recent performance of the storage system relative to the performance metric; determining in the storage system whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and performing a deduplication operation on the data object in the storage system prior to the data object being stored in a persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. 2. The method of claim 1 wherein adjusting the deduplication probability threshold includes reducing the deduplication probability threshold in an event a performance projection for the storage system indicates that the storage system will continue to satisfy the performance metric using the reduced deduplication probability threshold. 3. The method of claim 2 wherein the performance projection is based on the recent performance of the storage system. 4. The method of claim 1 wherein determining the probability of deduplication for the data object based on the characteristic includes determining the probability of deduplication for the data object based on statistical information from prior deduplication operations on other data objects having the characteristic. 5. The method of claim 4 wherein the statistical information indicates a percentage of the other data objects having the characteristic that were deduplication candidates in the prior deduplication operations, wherein a deduplication candidate includes data blocks that are duplicates of other data blocks in the storage system. 6. The method of claim 5 wherein the prior deduplication operations include a plurality of post-processing deduplication operations performed on data objects already stored in the persistent storage of the storage system. 7. The method of claim 4 wherein determining the probability of deduplication for the data object further includes determining a second characteristic of the data object, and determining the probability of deduplication for the data object is further based on statistical information associated with the second characteristic. 8. The method of claim 1 wherein the performance metric is a service level objective (SLO). 9. The method of claim 1 wherein the performance metric includes a response time of the storage system, a throughput of the storage system, a utilization of one or more networks associated with the storage system, or a combination thereof. 10. The method of claim 1 wherein the characteristic is a size of the data object, a type of the data object, an owner of the data object, a last modified date of the data object, an update frequency of the data object, or a combination thereof. 11. The method of claim 1 further comprising performing a post-processing deduplication operation on the data object after the data object is stored in the persistent storage of the storage system in an event it is determined that the probability of deduplication for the data object does not satisfy the deduplication probability threshold. 12. The method of claim 1 wherein the storage system is operated in a Network Attached Storage (NAS) environment or in a Storage Area Network (SAN). 13. A data storage system comprising: a processor; and a memory coupled with the processor and including a storage manager that directs the processor to: determine a deduplication probability threshold, the deduplication probability threshold determined based on a performance metric of the data storage system and adjusted based on availability of resources of the data storage system and recent performance of the data storage system relative to the performance metric; determine, prior to a data object being stored in a persistent storage, a probability of deduplication for the data object, the probability of deduplication for the data object determined based on a characteristic of the data object, wherein the probability of deduplication for the data object is a statistical projection indicating a likelihood that the data object will provide a storage space benefit as a result of deduplication; determine whether the probability of deduplication for the data object satisfies the deduplication probability threshold; and perform a deduplication operation on the data object prior to the data object being stored in the persistent storage in an event it is determined that the probability of deduplication for the data object satisfies the deduplication probability threshold. 14. The data storage system of claim 13 wherein to adjust the deduplication probability threshold includes to reduce the deduplication probability threshold in an event a performance projection for the data storage system based on historical performance information indicates that the data storage system will continue to meet the performance metric using the reduced deduplication probability threshold. 15. The data storage system of claim 13 wherein to determine the probability of deduplication for the data object based on the characteristic includes to: determine the probability of deduplication for the data object based on statistical information associated with the characteristic, the statistical information aggregated from prior deduplication operations. 16. The data storage system of claim 15 wherein the statistical information associated with the characteristic indicates a percentage of other data objects having the characteristic which were found to include data blocks that were duplicates of other data blocks during the prior deduplication operations. 17. The data storage system of claim 15 wherein: to determine the probability of deduplication for the data object based on the characteristic further includes to determine a second characteristic of the data object; and to determine the probability of deduplication for the data object is further based on statistical information associated with the second characteristic. 18. The data storage system of claim 13 wherein the performance metric is a service level objective (SLO). 19. The data storage system of claim 13 wherein the performance metric includes a response time of the data storage system, a throughput of the data storage system, a utilization of a network associated with the data storage system, or a combination thereof. 20. The data storage system of claim 13 wherein the characteristic is a size of the data object, a type of the data object, an owner of the data object, a last modified date of the data object, an update frequency of the data object, or a combination thereof. 21. The data storage system of claim 13 wherein the storage manager further directs the processor to perform a post-processing deduplication operation on the data object after it is stored in the persistent storage in a

Assignees

Inventors

Classifications

  • Change logging, detection, and notification (replication G06F16/27) · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • De-duplication techniques · CPC title

  • G06F3/067Primary

    Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9348538B2 cover?
Methods and apparatuses for performing selective deduplication in a storage system are introduced here. Techniques are provided for determining a probability of deduplication for a data object based on a characteristic of the data object and performing a deduplication operation on the data object in the storage system prior to the data object being stored in persistent storage of the storage sy…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).