Efficient method to optimize distributed segment processing mechanism in dedupe systems by leveraging the locality principle

US12032536B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12032536-B2
Application numberUS-202117191403-A
CountryUS
Kind codeB2
Filing dateMar 3, 2021
Priority dateMar 3, 2021
Publication dateJul 9, 2024
Grant dateJul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment, filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints, reading, at the dedupe system, from a container where copies of the non-unique fingerprints are stored, an additional set of non-unique fingerprints, sending, from the dedupe system to the client, a single response that comprises both the set of unique fingerprints and the additional set of non-unique fingerprints, and receiving from the client, at the dedupe system, data segments that respectively correspond to the unique fingerprints in the set of unique fingerprints, but no data segments corresponding to the non-unique fingerprints in the set of non-unique fingerprints are received by the dedupe system from the client.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment; filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints; determining a container in a set of containers in the dedupe system where copies of the non-unique fingerprints are stored; reading, at the dedupe system, from the container, an additional set of non-unique fingerprints, which exist in the container and are additional to the set of non-unique fingerprints, wherein the fingerprints in the additional set of non-unique fingerprints are selected based on a likelihood that those fingerprints would be requested in a future request by the client; sending, from the dedupe system to the client in response to the request comprising the set of fingerprints, a single response that comprises both the set of unique fingerprints and the additional set of non-unique fingerprints, wherein a number of fingerprints in the single response is predetermined; and receiving from the client, at the dedupe system, data segments that respectively correspond to the unique fingerprints in the set of unique fingerprints, but no data segments corresponding to the non-unique fingerprints in the set of non-unique fingerprints are received by the dedupe system from the client. 2. The method as recited in claim 1 , wherein all of the fingerprints in the response are sent in response to a single request from the client. 3. The method as recited in claim 1 , wherein a number of fingerprints in the additional set of non-unique fingerprints is specified in the request. 4. The method as recited in claim 1 , wherein some of the non-unique fingerprints in the additional set of non-unique fingerprints match fingerprints stored at the client. 5. The method as recited in claim 1 , further comprising writing, by the dedupe system, the data segments that respectively correspond to the unique fingerprints, and adding, by the dedupe system, the unique fingerprints to an index or database of the dedupe system. 6. The method as recited in claim 1 , wherein filtering comprises comparing the fingerprints contained in the request against a fingerprint filter. 7. The method as recited in claim 1 , wherein the fingerprints in the additional set of non-unique fingerprints do not necessitate an additional request from the client. 8. The method as recited in claim 1 , further comprising receiving, by the dedupe system from the client, a segment tree or metadata mapping associated with the data segments that respectively correspond to the unique fingerprints. 9. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment; filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints; determining a container in a set of containers in the dedupe system where copies of the non-unique fingerprints are stored; reading, at the dedupe system, from the container, an additional set of non-unique fingerprints, which exist in the container and are additional to the set of non-unique fingerprints, wherein the fingerprints in the additional set of non-unique fingerprints are selected based on a likelihood that those fingerprints would be requested in a future request by the client; sending, from the dedupe system to the client in response to the request comprising the set of fingerprints, a single response that comprises both the set of unique fingerprints and the additional set of non-unique fingerprints, wherein a number of fingerprints in the single response is predetermined; and receiving from the client, at the dedupe system, data segments that respectively correspond to the unique fingerprints in the set of unique fingerprints, but no data segments corresponding to the non-unique fingerprints in the set of non-unique fingerprints are received by the dedupe system from the client. 10. The non-transitory storage medium as recited in claim 9 , wherein all of the fingerprints in the response are sent in response to a single request from the client. 11. The non-transitory storage medium as recited in claim 9 , wherein a number of fingerprints in the additional set of non-unique fingerprints is specified in the request. 12. The non-transitory storage medium as recited in claim 9 , wherein some of the non-unique fingerprints in the additional set of non-unique fingerprints match fingerprints stored at the client. 13. The non-transitory storage medium as recited in claim 9 , wherein the operations further comprise writing, by the dedupe system, the data segments that respectively correspond to the unique fingerprints, and adding, by the dedupe system, the unique fingerprints to an index or database of the dedupe system. 14. The non-transitory storage medium as recited in claim 9 , wherein filtering comprises comparing the fingerprints contained in the request against a fingerprint filter. 15. The non-transitory storage medium as recited in claim 9 , wherein the fingerprints in the additional set of non-unique fingerprints do not necessitate an additional request from the client. 16. The non-transitory storage medium as recited in claim 9 , wherein the operations further comprise receiving, by the dedupe system from the client, a segment tree or metadata mapping associated with the data segments that respectively correspond to the unique fingerprints.

Assignees

Inventors

Classifications

  • Updates performed during online database operations; commit processing · CPC title

  • the solution involving signatures · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12032536B2 cover?
One example method includes receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment, filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints, reading, at the dedupe system, from a container where copies of the non…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).