What technology area does this patent fall under?

Primary CPC classification G06F3/0608. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data deduplication with augmented cuckoo filters

US10222987B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10222987-B2
Application number	US-201615041938-A
Country	US
Kind code	B2
Filing date	Feb 11, 2016
Priority date	Feb 11, 2016
Publication date	Mar 5, 2019
Grant date	Mar 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data deduplication process maintains a data dictionary including a storage tablet and a secondary index containing data indicative of previously received data blocks. The tablet includes hashes of previous data blocks and the index includes one or more cuckoo filters storing fingerprints derived from block hashes of previous data blocks. When a new data block arrives, its block hash and fingerprint are generated. The storage tablet is queried with the block hash and the secondary index is queried with the fingerprint. If the dictionary contains no matching block hash or fingerprint, the new data block is stored in its entirety. If the dictionary contains a matching block hash or fingerprint, the new data block may be a duplicate data block that can be deduplicated by storing a reference to the previous data block instead of storing the new data block in its entirety.

First claim

Opening claim text (preview).

What is claimed is: 1. A data deduplication method, comprising: responsive to detecting a data block, generating a block hash for the data block; querying a data dictionary for data indicative of a previous occurrence of the block hash, the data dictionary comprising: an active storage tablet, comprising a plurality of records, each record including a previously received block hash and a corresponding storage location; and a secondary index including a plurality of augmented cuckoo filters (ACFs), wherein each ACF includes a plurality of ACF entries representing previously received block hashes, wherein each of the plurality of ACF entries includes: a fingerprint derived from a corresponding block hash; and a tablet index indicative of a particular storage tablet associated with the corresponding block hash; wherein the plurality of ACFs includes: a first layer ACF, representing N1 block hashes, wherein N1 is an integer greater than 1; and a second layer ACF, each second layer ACF representing N2 previously generated first layer ACFs; wherein querying the data dictionary includes: generating the fingerprint for the block hash; and querying the plurality of ACFs for a matching fingerprint; and responsive to a result of the querying indicating no previous occurrence of the block hash: storing the data block to a storage location in a storage medium; storing the block hash and the storage location as a record in the active storage tablet; inserting filter construction fields derived from the block hash as a record in a filter construction array; and subject to sufficient entries in the filter construction array, generating a new ACF from a plurality of records in the filter construction array. 2. The method of claim 1 wherein querying the plurality of ACFs includes: accessing a set of filter indices identifying a set of ACF entries within which the fingerprint may be located; and comparing the fingerprint with fingerprints stored in the set of ACF entries. 3. The method of claim 2 , wherein the filter construction fields include: a fingerprint field for the fingerprint; a tablet index field for the tablet index; and a set of filter index fields, each of the filter index fields for one of the set of filter indices. 4. The method of claim 3 , further comprising: generating the filter construction fields, said generating including hashing the block hash according to a fingerprint hashing algorithm to generate the fingerprint. 5. The method of claim 1 wherein generating a new ACF includes: generating a new first layer ACF every N1 filter construction array records and including the new first layer ACF in the secondary index; and generating a new second layer ACF every N2*N1 filter construction array records and replacing existing first layer ACFs in the secondary index with the new second layer ACF. 6. The method of claim 5 , wherein the secondary index includes at least one of: a first layer ACF; a second layer ACF; and a third layer ACF representing N3 previously generated second layer ACFs. 7. The method of claim 6 , wherein generating a new ACF includes: generating a new third layer ACF every N3*N2*N1 filter construction array records and replacing existing second layer ACFs in the secondary index with new third layer ACF. 8. The method of claim 1 , further comprising: responsive to detecting occupied storage tablet records exceeding a particular threshold: storing the active storage tablet to a storage tablet library; and creating a new active storage tablet, including associating a time stamp and a tablet index with the new active storage tablet. 9. The method of claim 8 , wherein the data dictionary further includes a storage tablet cache including: one or more active storage tablets into one of which newly ingested block hashes and corresponding storage locations are inserted; and one or more retrieved storage tablets comprising storage tablets retrieved from the storage tablet library. 10. The method of claim 9 , wherein the data dictionary resides in random access memory and the storage tablet library resides in persistent storage. 11. The method of claim 9 , further comprising: associating the data block with a corresponding data stream, wherein storing the block hash in the active storage table comprises storing the block hash in a particular active storage tablet associated with the data stream. 12. The method of claim 9 , further comprising: responsive to a result of the query indicating a previous occurrence of the block hash: determining whether the data block is a duplicate of a previous data block corresponding to the previous occurrence of the block hash; and responsive to determining that the data block is a duplicate of the previous data block, storing a reference to the previous data block, in lieu of storing the data block, at the storage location. 13. The method of claim 12 , wherein the previous occurrence of the block hash corresponds to an ACF entry in the secondary index and wherein the method includes: retrieving a storage tablet from a storage tablet library and storing the storage tablet retrieved as a retrieved storage tablet in the storage tablet cache. 14. The method of claim 1 , further comprising: querying a second data dictionary, associated with a second data storage device, with the block hash; and responsive to the query hitting in the data dictionary of the second data storage device, storing, in the active storage tablet, the block hash and the most recent storage location associated with the block hash. 15. A data deduplication method, comprising: responsive to detecting a data block, generating a block hash for the data block; querying a data dictionary for data indicative of a previous occurrence of the block hash, the data dictionary comprising: an active storage tablet, comprising a plurality of records, each record including a previously received block hash and a corresponding storage location; and a secondary index including a plurality of augmented cuckoo filters (ACFs), wherein each of the plurality of ACFs includes a plurality of ACF entries representing previously received block hashes, wherein each of the plurality of ACF entries includes: a fingerprint derived from a corresponding block hash; and a tablet index indicative of a particular storage tablet associated with the corresponding block hash; wherein querying the data dictionary includes: generating the fingerprint for the block hash; and querying the plurality of ACFs for a matching fingerprint, wherein querying the ACF includes: determining a set of filter indices identifying a corresponding set of ACF entries within which the fingerprint may be located; and comparing the fingerprint with a fingerprint stored in each of the set of ACF entries; responsive to a result of the querying of the data dictionary indicating no previous occurrence of the block hash: storing the data block to a storage location in a storage medium; storing the block hash and the storage location as a record in the active storage tablet; inserting filter construction fields derived from the block hash as a record in a filter construction array wherein the filter construction fields include: a fingerprint field for the fingerprint; a tablet index field for the tablet index; and a set of filter index fields, each of the filter index fields for one of the set of filter indices, wherein the set of filter indices includes a first filter index and a second filter index and wherein generating the filter construction fields includes: hashing the bl

Assignees

Dell Products Lp

Inventors

Brosch Ryan W

Classifications

G06F3/0685
Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title
G06F17/30097
Physics · mapped topic
G06F3/0641
De-duplication techniques · CPC title
G06F17/30156
Physics · mapped topic
G06F3/0608Primary
Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

View patent family 59562069

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10222987B2 cover?: A data deduplication process maintains a data dictionary including a storage tablet and a secondary index containing data indicative of previously received data blocks. The tablet includes hashes of previous data blocks and the index includes one or more cuckoo filters storing fingerprints derived from block hashes of previous data blocks. When a new data block arrives, its block hash and finge…
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).