Data deduplication acceleration

US11314724B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11314724-B2
Application numberUS-202016865617-A
CountryUS
Kind codeB2
Filing dateMay 4, 2020
Priority dateOct 19, 2017
Publication dateApr 26, 2022
Grant dateApr 26, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for data deduplication may include: receiving write operations that write first data; partitioning the first data into a plurality of data portions; generating, using a first hash function, a plurality of data deduplication hash values for the plurality of data portions, wherein a first data deduplication hash value of the plurality of data deduplication hash values is produced by said generating for a first of the plurality of data portions; performing first processing using a Bloom filter to determine whether the first data deduplication hash value has a corresponding first entry in a data store of deduplication hash values; and responsive to the first processing determining the first data deduplication hash value does not have the corresponding first entry in the data store of deduplication hash values, performing second processing, said second processing including adding the corresponding first entry in the data store of deduplication hash values.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of performing data deduplication in a system comprising: receiving one or more write operations that write first data; partitioning the first data into a plurality of data portions; generating, using a first hash function, a plurality of data deduplication hash values for the plurality of data portions, wherein a first data portion is included in the plurality of data portions and a first data deduplication hash value of the plurality of data deduplication hash values is produced by said generating for the first data portion, wherein each of the plurality of data deduplication hash values generated using the first hash function uniquely represents a corresponding one of the plurality of data portions; performing first processing using a Bloom filter to determine whether the first data deduplication hash value has a corresponding first entry in a data store of deduplication hash values; and responsive to the first processing determining the first data deduplication hash value does not have the corresponding first entry in the data store of deduplication hash values, performing second processing, said second processing including adding the corresponding first entry in the data store of deduplication hash values, and wherein the system comprises a central processing unit (CPU) domain and a second domain, wherein the CPU domain includes one or more main processors and wherein the second domain includes a first device comprising a plurality of processors, wherein at least said generating is performed in the second domain to offload processing from the CPU domain, and wherein said generating includes the plurality of processors of the first device of the second domain executing a same first instruction stream in parallel, wherein the same first instruction stream includes code of the first hash function and each of the plurality of processors of the first device receives as input a different one of the plurality of data portions and generates a different one of the plurality of data deduplication hash values that corresponds to said different one of the plurality of data portions. 2. The method of claim 1 , wherein the first processing further includes: determining, in accordance with Bloom filter hash functions and the first data deduplication hash value, a first set of bit positions of the Bloom filter, wherein each bit position in the first set identifies a bit position in the Bloom filter; querying the Bloom filter to determine whether any bit position of the first set has a corresponding bit position in the Bloom filter that is set to zero; and responsive to determining that at least one bit position of the first set has a corresponding bit position in the Bloom filter that is set to zero, determining that the first data deduplication hash value does not have the corresponding first entry in the data store of data deduplication hash values. 3. The method of claim 2 , further comprising: responsive to determining that no bit position of the first set has a corresponding bit position in the Bloom filter that is set to zero, determining that the first data deduplication hash value may have the corresponding first entry in the data store of data deduplication hash values and performing additional processing to definitely determine whether the first data deduplication hash value has the corresponding first entry in the data store of data deduplication hash values. 4. The method of claim 3 , wherein said additional processing includes: querying the data store of data deduplication hash values to determine whether the first data deduplication hash value has the corresponding first entry in the data store of data deduplication hash values. 5. The method of claim 4 , further comprising: responsive to said querying the data store of data deduplication hash values determining the first data deduplication hash value has the corresponding first entry in the data store of data deduplication hash values, discarding the first data portion and determining that the first data portion is already stored in a data store of deduplicated data portions. 6. The method of claim 4 , further comprising: responsive to said querying the data store of data deduplication hash values determining the first data deduplication hash value does not have the corresponding first entry in the data store of data deduplication hash values, performing other processing including: adding the corresponding first entry in the data store of deduplication hash values; storing the first data portion in the data store of deduplication data portions; and mapping the corresponding first entry to the first data portion as stored in the data store of deduplication data portions. 7. The method of claim 6 , wherein the other processing includes updating the Bloom filter in accordance with the first set of bit positions, wherein each bit position in the first set identifies a bit position in the Bloom filter which is set to one by said updating. 8. The method of claim 1 , wherein the Bloom filter is a probabilistic data structure that provides a definitive indication of particular data deduplication hash values that do not have corresponding entries in the data store of data deduplication hash values, and provides an indefinite indication of particular data deduplication hash values that have corresponding entries in the data store of data deduplication hash values. 9. The method of claim 8 , wherein the indefinite indication is a probabilistic indication as to whether particular data deduplication hash values have corresponding entries in the data store of data deduplication hash values. 10. The method of claim 1 , wherein the second processing includes storing the first data portion in a data store of deduplicated data portions and updating the Bloom filter in accordance with the first data portion. 11. The method of claim 10 , wherein the data store of deduplication data portions includes only a single unique instance of each data portion processed in connection with data deduplication. 12. The method of claim 1 , wherein said first processing is performed in the second domain and includes executing second code, by one or more processors of the second domain, that uses the Bloom filter to determine whether the first data deduplication hash value has a corresponding first entry in a data store of deduplication hash values. 13. The method of claim 12 , further comprising: journaling write operations that add new entries to the data store of deduplication hash values. 14. A system comprising: one or more processors; and one or more memories comprising code stored thereon that, when executed, performs method of performing data deduplication comprising: receiving one or more write operations that write first data; partitioning the first data into a plurality of data portions; generating, using a first hash function, a plurality of data deduplication hash values for the plurality of data portions, wherein a first data portion is included in the plurality of data portions and a first data deduplication hash value of the plurality of data deduplication hash values is produced by said generating for the first data portion, wherein each of the plurality of data deduplication hash values generated using the first hash function uniquely represents a corresponding one of the plurality of data portions; performing first processing using a Bloom filter to determine whether the first data deduplication hash value has a corresponding first entry in a data store of deduplication hash values; and responsive to the first processing determining the first data de

Assignees

Inventors

Classifications

  • Ensuring data consistency and integrity · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • Search customisation based on user profiles and personalisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11314724B2 cover?
Techniques for data deduplication may include: receiving write operations that write first data; partitioning the first data into a plurality of data portions; generating, using a first hash function, a plurality of data deduplication hash values for the plurality of data portions, wherein a first data deduplication hash value of the plurality of data deduplication hash values is produced by sa…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2365. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).