Targeted deduplication using server generated group fingerprints for virtual synthesis

US12174708B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12174708-B2
Application numberUS-202217873769-A
CountryUS
Kind codeB2
Filing dateJul 26, 2022
Priority dateJul 26, 2022
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of a targeted deduplication process that splits protected data into variable size segments, generates a fingerprint for each segment, and then combines fingerprints into groups to form group fingerprints. An embodiment auto-generates and persists the group fingerprints for the backups which are already on the storage server, thus enabling the backup client to fetch these fingerprints using an identifier and enforce synthesis for the new backup or replication copy against any previously written backup. For this embodiment, group fingerprints are generated on the storage server itself, rather than being generated on and pushed from the backup client for mere storage on the storage server, so that, as files are ingested, the storage server also auto-generates group fingerprints on its own.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising: sending, from the backup client to the storage server, an insight about an old backup for use in deduplicating a new backup and a request for a group fingerprint set; first generating, using the insight on the storage server, group fingerprints for the old backup; fetching the group fingerprints from the storage server to the backup client; second generating, on the backup client, group fingerprints of the new backup; comparing, on the backup client, the new backup group fingerprints to the old backup group fingerprints; sending, for any matching group fingerprints in the comparing step, from the backup client to the storage server, a virtual synthetic copy request; and creating the new backup in response to the virtual synthetic copy request, and wherein the deduplicated backup system comprises a Data Domain file system (DDFS) and a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for data ingests, and which translates application read and write request to DDBoost application program interfaces (APIs). 2. The method of claim 1 wherein protected data comprises a plurality of data segments, and the group fingerprints comprise a set of fingerprints, and wherein each fingerprint comprising a signature for a respective data segment of the plurality of data segments. 3. The method of claim 2 wherein the signature for each respective data segment is generated using a cryptographic hash function, and wherein the fingerprints are stored in a L0 to L6 layered directory tree. 4. The method of claim 2 wherein the fingerprints of each group fingerprints are grouped using a defined grouping algorithm, and wherein the insight comprises at least one of: a backup identifier, a filename and path of the previous backup, or other identifying information about the previous backup. 5. The method of claim 1 further comprising, if there is no match in the comparing step, copying data for the group fingerprints of the new backup from the backup client to the storage server, performing using per-segment deduplication in the backup client. 6. The method of claim 5 wherein the backup identifier comprises at least one of: a job number, backup location information, a filename and path of a previous backup, or other identifying information about one or more previous backups. 7. The method of claim 1 wherein each of the first generating step and second generating step further comprises: dividing the backup data into variable size segments; generating a fingerprint for each segment; combining generated fingerprints to form the group fingerprints; and storing the group fingerprints, segments, and fingerprints in one of the storage server and the backup client. 8. The method of claim 1 wherein the matching group fingerprints and the insight enable virtual synthetic backups for an application that does not have sufficient knowledge of change data blocks from the previous backup to use virtual synthetic backup operations by itself. 9. The method of claim 1 further comprising: adding, in the storage server, additional insights comprising information not provided but deducible by the storage server, to the client-sent insights to form combined insights; and re-generating, in the storage server any group fingerprints that do not exist for matching backups based on the combined insights. 10. The method of claim 9 further comprising, after the step of creating the new backup in response to the virtual synthetic copy request, releasing by the backup client, the requested group fingerprint handle, and unlocking, by the storage server, the matching backups. 11. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising a server-side process of: receiving, from the backup client to the storage server, an insight about an old backup and a request for a group fingerprint set; adding, in the storage server, an additional insight to the insight received from the backup client to form combined insights; locating, in the storage server, backups that maximize client-side deduplication for virtual synthetic backups; re-generating, in the storage server, any group fingerprints that do not exist for a matching set of backups based on the combined insights; sending, from the storage server to the backup client, a handle for the re-generated group fingerprints; locking the matching set of backups in the storage server, and further comprising a client-side process of: sending the insight request for a group fingerprint set to the storage server; fetching, in the backup client in response to the handle sent from the storage server, group fingerprints comprising the generated and re-generated group fingerprints; generating, in the backup client, new backup group fingerprints; comparing the new backup group fingerprints to the group fingerprints fetched from the storage server to find any matching group fingerprints; generating virtual synthetic copy instructions for data corresponding to any matching group fingerprints; performing, on the backup client, segment-level deduplication for data corresponding to any non-matching group fingerprints; generating, from the backup client, a backup operation that uses the virtual synthetic copy instructions; committing the backup operation to the storage server; and releasing, by the backup client, the group fingerprint handle allowing the storage server to release the group fingerprints generated by the storage server. 12. The method of claim 11 further comprising unlocking, by the storage server, the backups corresponding to the released group fingerprints. 13. The method of claim 11 wherein the fingerprints of each group fingerprints are grouped using a defined grouping algorithm, and wherein the insight comprises at least one of: a backup identifier, a filename and path of the previous backup, or other identifying information about the previous backup. 14. The method of claim 12 wherein the additional insight comprises any information not provided by the backup client, but that can be deduced by the storage server itself. 15. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising a client-side process of: sending, from the backup client to the storage server, an insight about an old backup and a request for a group fingerprint set; fetching, in the backup client in response to a handle sent from the storage server, group fingerprints comprising the generated and re-generated group fingerprints; generating, in the backup client, new backup group fingerprints; comparing the new backup group fingerprints to the group fingerprints fetched from the storage server to find any matching group fingerprints; generating virtual synthetic copy instructions for data corresponding to any matching group fingerprints; and performing, on the backup client, segment-level deduplication for data corresponding to any non-matching group fingerprints, and wherein the deduplicated backup system comprises a Data Domain file system (DDFS) and a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for

Assignees

Inventors

Classifications

  • for networked environments · CPC title

  • Using snapshots, i.e. a logical point-in-time copy of the data · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12174708B2 cover?
Embodiments of a targeted deduplication process that splits protected data into variable size segments, generates a fingerprint for each segment, and then combines fingerprints into groups to form group fingerprints. An embodiment auto-generates and persists the group fingerprints for the backups which are already on the storage server, thus enabling the backup client to fetch these fingerprint…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/1464. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).