Workload optimized data deduplication using ghost fingerprints
US-2018060367-A1 · Mar 1, 2018 · US
US12259859B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12259859-B2 |
| Application number | US-202016911429-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 25, 2020 |
| Priority date | Aug 19, 2019 |
| Publication date | Mar 25, 2025 |
| Grant date | Mar 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deduplication system includes a similarity searcher, a difference calculator, and a storage manager. The similarity searcher searches for a similar fingerprint in a database storing a plurality of local sensitive fingerprints, resembling a new fingerprint of a new block. The difference calculator computes a difference block between the input block and a similar block associated with the found similar fingerprint, and the storage manager updates the database with the new fingerprint and stores the difference block, if not empty, in a store. A method for deduplication includes searching in a database, storing a plurality of local sensitive fingerprints, a similar fingerprint, resembling a new fingerprint of a new block, calculating a difference block between the input block and a similar block associated with the similar fingerprint, if found, updating the database with the new fingerprint and storing the difference block, if it is not empty, in a storage unit.
Opening claim text (preview).
What is claimed is: 1. A deduplication system for a storage unit, the deduplication system comprising: an associative memory device to perform associative processing and comprising a memory array having columns divided into sections, of which a fingerprint section stores a plurality of fingerprints associated with blocks of data, each fingerprint being stored in a separate column of said fingerprint section; said associative memory device also comprising: a similarity searcher operating on said columns to receive an input fingerprint of an input block and to perform a search inside columns of said fingerprint section for a similar fingerprint whose distance to said input fingerprint is smaller than a predetermined threshold value; and a difference calculator operating on said columns to compute a difference block indicating relative changes between said input block and a similar block associated with said similar fingerprint, if found; and a difference block storage manager to, if said difference block is a non-empty difference block, associate said input fingerprint with said similar block and with said difference block, store said input fingerprint in one column of said fingerprint section, and store said non-empty difference block in said storage unit, wherein said fingerprint section is arranged in a multi-level structure wherein upper levels comprise centroids to clusters in lower levels, and a lowest level comprises fingerprints of blocks, said centroids calculated from said fingerprints. 2. The system of claim 1 wherein said storage manager to store said input block if said similar fingerprint was not found. 3. The system of claim 1 also comprising: a fingerprint creator to create said new fingerprint using a locality-sensitive hashing (LSH) algorithm. 4. The system of claim 1 wherein said storage manager to store fingerprints of an uppermost level in said columns and wherein said similarity searcher performs said search in said uppermost level and a search in lower levels is performed in a CPU. 5. The system of claim 1 and also comprising: a block splitter to split said input block to smaller sub-blocks; a collision resistance fingerprint creator to create a collision resistance fingerprint for each of said sub-blocks; and an exact searcher to search in said fingerprint section for identical fingerprints matching said collision resistance fingerprints, said storage manager to update said fingerprint section with said collision resistance fingerprints and to store sub-blocks for which identical fingerprints were not found. 6. A method for deduplication comprising: storing a plurality of fingerprints associated with blocks of data in a fingerprint section of an associative memory device comprising a memory array having columns divided into sections, each said fingerprint being stored in a separate column of said columns and said blocks of data being stored in a storage unit; receiving an input fingerprint of an input block; searching inside columns of said fingerprint section for a similar fingerprint whose distance to said input fingerprint is smaller than a predetermined threshold value; inside said columns of said fingerprint section, calculating a difference block of relative changes between said input block and a similar block associated with said similar fingerprint, if found; if said difference block is a non-empty difference block: associating said input fingerprint with said similar block and with said difference block, adding said input fingerprint in one column of said fingerprint section; and storing said non-empty difference block in said storage unit, wherein said fingerprint section is arranged in a multi-level structure wherein upper levels comprise centroids to clusters in lower levels, and a lowest level comprises fingerprints of blocks, said centroids calculated from said fingerprints. 7. The method of claim 6 wherein said storing also comprises storing said input block if said similar fingerprint was not found. 8. The method of claim 6 also comprising: creating said new fingerprint using a locality-sensitive hashing (LSH) algorithm. 9. The method of claim 6 and also comprising loading fingerprints of an uppermost level into said columns and wherein said searching comprises performing a search in said uppermost level by said associative memory device and performing a search in lower levels by a CPU. 10. The method of claim 6 also comprising: splitting said input block into smaller sub-blocks; creating a collision resistance fingerprint for each of said sub-blocks; performing an exact search to find an exact match to each of said collision resistance fingerprints in said fingerprint section; updating said fingerprint section with said collision resistance fingerprints; and storing sub-blocks for which identical fingerprints were not found. 11. The method of claim 6 also comprising: performing an exact search between said input fingerprint and a database of fingerprints created using a collision resistance algorithm; and perform said searching if an exact match is not found.
Hierarchical databases, e.g. IMS, LDAP data stores or Lotus Notes · CPC title
Aggregation; Duplicate elimination · CPC title
Asynchronous replication or reconciliation · CPC title
Column-oriented storage; Management thereof · CPC title
Databases characterised by their database models, e.g. relational or object models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.