Efficient method to optimize distributed segment processing mechanism in dedupe systems by leveraging the locality principle

US12222913B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12222913-B2
Application numberUS-202318525344-A
CountryUS
Kind codeB2
Filing dateNov 30, 2023
Priority dateMar 3, 2021
Publication dateFeb 11, 2025
Grant dateFeb 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment, filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints, reading, at the dedupe system, from a container where copies of the non-unique fingerprints are stored, an additional set of non-unique fingerprints, sending, from the dedupe system to the client, a single response that comprises both the set of unique fingerprints and the additional set of non-unique fingerprints, and receiving from the client, at the dedupe system, data segments that respectively correspond to the unique fingerprints in the set of unique fingerprints, but no data segments corresponding to the non-unique fingerprints in the set of non-unique fingerprints are received by the dedupe system from the client.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: sending, by a client to a dedupe system, a set of fingerprints for filtering by the dedupe system; receiving, by the client from the dedupe system, fingerprints of the set of fingerprints that were identified by the dedupe system as unique fingerprints, and additional non-unique fingerprints from a set of containers of the dedupe system; sending, by the client to the dedupe system, only unique segments for writing by the dedupe system, and the unique segments form a set; receiving, by the client system, data from a backup dataset, and segmenting and fingerprinting the data from the backup dataset to create a next set of fingerprints; filtering, by the client system, fingerprints in the next set of fingerprints, using additional existing fingerprints received earlier from the dedupe system, so as to identify unique segments of the next set; and sending, by the client system to the dedupe system, a segment tree of the unique segments of the next set. 2. The method as recited in claim 1 , wherein any fingerprints identified, during the filtering, as matching fingerprints already present in a local in-memory list, are not sent by the client system to the dedupe system. 3. The method as recited in claim 2 , wherein the local in-memory list includes one or more non-unique fingerprints previously sent by the client system to the dedupe system. 4. The method as recited in claim 1 , wherein the segment tree is usable by the dedupe system to build file metadata for a file. 5. The method as recited in claim 1 , wherein an amount of the unique fingerprints sent by the dedupe system to the client is specified by the client. 6. The method as recited in claim 1 , wherein an amount of the unique fingerprints sent by the dedupe system to the client is a function of a value of an adaptive match rate that is determined by the client. 7. The method as recited in claim 1 , wherein the unique fingerprints, and the additional non-unique fingerprints from the set of containers of the dedupe system, are both received by the client in response to a single RPC (remote procedure call) issued by the client to the dedupe system. 8. The method as recited in claim 1 , the unique fingerprints, and the additional non-unique fingerprints received by the client from the dedupe system, are written by the client to an in-memory list locally maintained at the client. 9. The method as recited in claim 1 , wherein the segment tree is sent at an end of a data ingest process at the dedupe system. 10. The method as recited in claim 1 , wherein the unique segments are created by the client. 11. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: sending, by a client to a dedupe system, a set of fingerprints for filtering by the dedupe system; receiving, by the client from the dedupe system, fingerprints of the set of fingerprints that were identified by the dedupe system as unique fingerprints, and additional non-unique fingerprints from a set of containers of the dedupe system; sending, by the client to the dedupe system, only unique segments for writing by the dedupe system, and the unique segments form a set; receiving, by the client system, data from a backup dataset, and segmenting and fingerprinting the data from the backup dataset to create a next set of fingerprints; filtering, by the client system, fingerprints in the next set of fingerprints, using additional existing fingerprints received earlier from the dedupe system, so as to identify unique segments of the next set; and sending, by the client system to the dedupe system, a segment tree of the unique segments of the next set. 12. The non-transitory storage medium as recited in claim 11 , wherein any fingerprints identified, during the filtering, as matching fingerprints already present in a local in-memory list, are not sent by the client system to the dedupe system. 13. The non-transitory storage medium as recited in claim 2 , wherein the local in-memory list includes one or more non-unique fingerprints previously sent by the client system to the dedupe system. 14. The non-transitory storage medium as recited in claim 11 , wherein the segment tree is usable by the dedupe system to build file metadata for a file. 15. The non-transitory storage medium as recited in claim 11 , wherein an amount of the unique fingerprints sent by the dedupe system to the client is specified by the client. 16. The non-transitory storage medium as recited in claim 11 , wherein an amount of the unique fingerprints sent by the dedupe system to the client is a function of a value of an adaptive match rate that is determined by the client. 17. The non-transitory storage medium as recited in claim 11 , wherein the unique fingerprints, and the additional non-unique fingerprints from the set of containers of the dedupe system, are both received by the client in response to a single RPC (remote procedure call) issued by the client to the dedupe system. 18. The non-transitory storage medium as recited in claim 11 , the unique fingerprints, and the additional non-unique fingerprints received by the client from the dedupe system, are written by the client to an in-memory list locally maintained at the client. 19. The non-transitory storage medium as recited in claim 11 , wherein the segment tree is sent at an end of a data ingest process at the dedupe system. 20. The non-transitory storage medium as recited in claim 11 , wherein the unique segments are created by the client.

Assignees

Inventors

Classifications

  • Updates performed during online database operations; commit processing · CPC title

  • the solution involving signatures · CPC title

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12222913B2 cover?
One example method includes receiving at a dedupe system, from a client, a request that comprises a set of fingerprints, where each fingerprint in the set corresponds to a particular data segment, filtering, at the dedupe system, the set of fingerprints into a set of unique fingerprints and a set of non-unique fingerprints, reading, at the dedupe system, from a container where copies of the non…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).