Elastic, ephemeral in-line deduplication service

US12353370B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12353370-B2
Application numberUS-202218071790-A
CountryUS
Kind codeB2
Filing dateNov 30, 2022
Priority dateSep 25, 2015
Publication dateJul 8, 2025
Grant dateJul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deduplication service can be provided to a storage domain from a services framework that expands and contracts to both meet service demand and to conform to resource management of a compute domain. The deduplication service maintains a fingerprint database and reference count data in compute domain resources, but persists these into the storage domain for use in the case of a failure or interruption of the deduplication service in the compute domain. The deduplication service responds to service requests from the storage domain with indications of paths in a user namespace and whether or not a piece of data had a fingerprint match in the fingerprint database. The indication of a match guides the storage domain to either store the piece of data into the storage backend or to reference another piece of data. The deduplication service uses the fingerprints to define paths for corresponding pieces of data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: in response to receiving a write request targeting a data unit, dividing the data unit into sub-units according to a sub-unit size; determining, by a redirector, a number of deduplicator instances to instantiate for deduplicating the sub-units based upon a deduplication service policy specifying a threshold amount of data that can be processed by a single executing deduplicator instance; creating a data unit manifest for the data unit with an indication of an order and count of the sub-units, wherein the data unit manifest is populated with paths to the sub-units according to a hierarchical namespace or a flat namespace where a path is a namespace identifier used to obtain data of a constituent sub-unit; and requesting deduplication for the sub-units by the number of deduplicator instances using the paths within the data unit manifest. 2. The method of claim 1 , wherein each deduplicator instance is assign up to the threshold amount of data to deduplicate as specified by the deduplication service policy. 3. The method of claim 1 , wherein determining the deduplicator instances to instantiate comprises: determining the number of deduplicator instances to instantiate and execute for deduplicating the sub-units based upon a size of the data unit. 4. The method of claim 1 , wherein determining the deduplicator instances to instantiate comprises: determining the number of deduplicator instances to instantiate and execute for deduplicating the sub-units based upon a number of sub-units into which the data unit is divided. 5. The method of claim 1 , comprising: obtaining, from a service dispatcher, location information for the deduplicator instances, wherein the location information corresponds to network addresses and ports of the deduplicator instances. 6. The method of claim 1 , comprising: caching, by the redirector into a cache, location information retrieved from a service dispatcher for the deduplicator instances; and utilizing the location information within the cache for processing a subsequent deduplication request. 7. The method of claim 1 , comprising: contacting, off a request path associated with processing the write request, a service dispatcher to refresh a cache used by the redirector to cache location information retrieved from the service dispatcher for the deduplicator instances. 8. The method of claim 1 , comprising: in response to determining that a deduplication service is unavailable based upon insufficient resources in a compute domain, notifying the deduplicator instance, by a service dispatcher, that the deduplication service is unavailable. 9. The method of claim 1 , comprising: hosting, by a deduplication service, a garbage collector to maintain reference count data for donor file data chunks managed within a service space in accordance with the reference count data. 10. The method of claim 1 , comprising: scanning, by a garbage collector of a deduplication service, a user space within a storage backend based upon checkpoints to identify data unit manifests added to the user space since a last scan; and incrementing reference counts for a set of sub-units indicated by the data unit manifests. 11. A non-transitory machine readable medium comprising instructions for performing a method, which when executed by a machine, causes the machine to perform operations comprising: in response to receiving a write request targeting a data unit, dividing the data unit into sub-units according to a sub-unit size; determining, by a redirector, a number of deduplicator instances to instantiate for deduplicating the sub-units based upon a deduplication service policy specifying a threshold amount of data that can be processed by a single executing deduplicator instance; creating a data unit manifest for the data unit with an indication of an order and count of the sub-units, wherein the data unit manifest is populated with paths to the sub-units according to a hierarchical namespace or a flat namespace where a path is a namespace identifier used to obtain data of a constituent sub-unit; and requesting deduplication for the sub-units by the number of deduplicator instances using the paths within the data unit manifest. 12. The non-transitory machine readable medium of claim 11 , wherein each deduplicator instance is assign up to the threshold amount of data to deduplicate as specified by the deduplication service policy. 13. The non-transitory machine readable medium of claim 11 , wherein determining the deduplicator instances to instantiate comprises: determining the number of deduplicator instances to instantiate and execute for deduplicating the sub-units based upon a size of the data unit. 14. The non-transitory machine readable medium of claim 11 , wherein determining the deduplicator instances to instantiate comprises: determining the number of deduplicator instances to instantiate and execute for deduplicating the sub-units based upon a number of sub-units into which the data unit is divided. 15. The non-transitory machine readable medium of claim 11 , comprising: obtaining, from a service dispatcher, location information for the deduplicator instances, wherein the location information corresponds to network addresses and ports of the deduplicator instances. 16. The non-transitory machine readable medium of claim 11 , comprising: caching, by the redirector into a cache, location information retrieved from a service dispatcher for the deduplicator instances; and utilizing the location information within the cache for processing a subsequent deduplication request. 17. The non-transitory machine readable medium of claim 11 , comprising: contacting, off a request path associated with processing the write request, a service dispatcher to refresh a cache used by the redirector to cache location information retrieved from the service dispatcher for the deduplicator instances. 18. A computing device comprising: a memory comprising machine executable code for performing a method; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to: in response to receiving a write request targeting a data unit, divide the data unit into sub-units according to a sub-unit size; determine, by a redirector, a number of deduplicator instances to instantiate for deduplicating the sub-units based upon a deduplication service policy specifying a threshold amount of data that can be processed by a single executing deduplicator instance; create a data unit manifest for the data unit with an indication of an order and count of the sub-units, wherein the data unit manifest is populated with paths to the sub-units according to a hierarchical namespace or a flat namespace where a path is a namespace identifier used to obtain data of a constituent sub-unit; and request deduplication for the sub-units by the number of deduplicator instances using the paths within the data unit manifest. 19. The computing device of claim 18 , wherein each deduplicator instance is assign up to the threshold amount of data to deduplicate as specified by the deduplication service policy. 20. The computing device of claim 18 , wherein the machine executable code causes the processor to: cache, by the redirector into a cache, location information retrieved from a service dispatcher for the deduplicator instances; and utilize the location information within the cache for processing a subsequent dedupl

Assignees

Inventors

Classifications

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Distributed queries · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12353370B2 cover?
A deduplication service can be provided to a storage domain from a services framework that expands and contracts to both meet service demand and to conform to resource management of a compute domain. The deduplication service maintains a fingerprint database and reference count data in compute domain resources, but persists these into the storage domain for use in the case of a failure or inter…
Who is the assignee on this patent?
Netapp Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).