Targeted deduplication using server-side group fingerprints for virtual synthesis

US12174707B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12174707-B2
Application numberUS-202217730935-A
CountryUS
Kind codeB2
Filing dateApr 27, 2022
Priority dateApr 27, 2022
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of a targeted deduplication process that splits protected data into variable size segments, generates a fingerprint for each segment, and then combines fingerprints into groups to form group fingerprints. The group fingerprints are stored on and retrieved from a server by a client to identify duplicate data present on a server during the backup process on an “as needed” basis. The specific group fingerprints sent are based on knowledge of previous backups of the asset, either learned or provided as a hint from the backup application. Once it is known that a specific group fingerprint is present on the server, a virtual synthetic request can be generated instead of a traditional deduplication process. This enables virtual synthetic backups for applications that do not have sufficient knowledge of changed blocks from a previous backup to use the virtual synthetic operations on their own.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: dividing protected data into variable size segments; generating a fingerprint for each segment; combining generated fingerprints into group fingerprints; storing the group fingerprints, segments, and fingerprints on a deduplication backup server, wherein the protected data comprises part of a deduplication backup process executed by the deduplication backup server running a Data Domain file system (DDFS); generating, for new segments to be backed up, new group fingerprints; determining if any new group fingerprints match the stored group fingerprints; and making, if there is a match resulting in matching fingerprints, a virtual synthetic backup out of segments corresponding to the matching fingerprints, otherwise, making a backup using a per-segment deduplication process for segments corresponding to fingerprints that do not match, wherein the DDFS comprises a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for data ingests, and which translates application read and write request to DDBoost application program interfaces (APIs). 2. The method of claim 1 further comprising storing the new group fingerprints on the server for use in a subsequent comparison operation for a next backup. 3. The method of claim 2 further comprising: obtaining a hint from a deduplication client working together with the server to use the hint to identify a set of group fingerprints to use for comparison; sending the hint to the server; and fetching group fingerprints from the server based on the hint. 4. The method of claim 1 wherein the grouped fingerprints are grouped using a defined grouping algorithm, and wherein the hint constitutes an insight into workflow of the client and the server, and comprises at least one of: backup location information, a filename and path of a previous backup, or other identifying information about one or more previous backups. 5. A computer-implemented method comprising: obtaining a hint regarding one or more previous backups from a client coupled to a server backing up protected data from the client, wherein the server stores group fingerprints from the one or more previous backups, wherein the protected data comprises part of a deduplication backup process executed by the server running a Data Domain file system (DDFS); fetching group fingerprints from the older backups from the server based on the hint; sending new group fingerprints from the client to the server for a new backup session; comparing the new group fingerprints with the fetched group fingerprints to determine if any group fingerprints match; and sending, if there is a match, a virtual copy command to the backup server to copy data already stored in the server as part of the new backup session, wherein the DDFS comprises a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for data ingests, and which translates application read and write request to DDBoost application program interfaces (APIs). 6. The method of claim 5 further comprising, if there is no match, copying data for the new group fingerprints to the server using per-segment fingerprint deduplication. 7. The method of claim 6 further comprising: dividing the protected data into variable size segments; generating a fingerprint for each segment; combining generated fingerprints to form the group fingerprints; and storing the group fingerprints, segments, and fingerprints on a deduplication backup server. 8. The method of claim 7 further comprising generating, for new segments to be backed up, the new group fingerprints. 9. The method of claim 8 further comprising storing the new group fingerprints on the server for use in a subsequent comparison operation for a next backup. 10. The method of claim 5 wherein the hint comprises at least one of: backup location information, a filename and path of a previous backup, or other identifying information about one or more previous backups. 11. The method of claim 10 wherein the hint comprises a workflow insight from the client working together with the server to use to identify a set of group fingerprints to use for comparison between the previous backup and a current backup. 12. The method of claim 11 wherein the set of group fingerprints and the hint enable virtual synthetic backups for an application that does not have sufficient knowledge of change data blocks from the previous backup to use virtual synthetic backup operations by itself. 13. A computer-implemented method of enabling virtual synthetic backups for an application that does not have sufficient knowledge of what changed data blocks since a previous backup to itself use virtual synthetic operations, comprising: sending a hint from the client to a server working together with the client to identify a set of group fingerprints to use for comparison in a deduplication backup operation, wherein the server executes a deduplication backup process using a Data Domain file system (DDFS); providing group fingerprints from the server to a client, each group fingerprint comprising fingerprints of data blocks to be backed up from the client to the server and identified at least in part using the hint; comparing fingerprints from the previous backup to the fingerprints in the provided group fingerprints to generate matching fingerprints sending, from the client to the server, a virtual synthetic copy request for data represented by GFPs already present in the server; backing up, in a current backup, data for the matching fingerprints by combining old backup data segments from the previous backup with new data segments for the current backup; and sending the group fingerprints representing the data in the backup file to the server to be saved along with the new data segments, such that the group fingerprints are calculated by the server rather than by the client, wherein the DDFS comprises a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for data ingests, and which translates application read and write request to DDBoost application program interfaces (APIs). 14. The method of claim 13 further comprising: backing up data for any non-matching fingerprints using a per-segment deduplication backup process. 15. The method of claim 13 further comprising storing, on the server, the new group fingerprints for the new data segments for use as new previous backup data for a next backup operation. 16. The method of claim 13 the backup file is generated using a specific sequence of steps referred to as a ‘recipe’ that replication logic of the system will replay it to generate the same backup file on the client. 17. The method of claim 13 wherein the group fingerprints are grouped using a defined grouping algorithm, and wherein the hint comprises at least one of: backup location information, a filename and path of the previous backup, or other identifying information about the previous backup.

Assignees

Inventors

Classifications

  • by selection of backup contents · CPC title

  • for networked environments · CPC title

  • Backup restoration techniques · CPC title

  • using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12174707B2 cover?
Embodiments of a targeted deduplication process that splits protected data into variable size segments, generates a fingerprint for each segment, and then combines fingerprints into groups to form group fingerprints. The group fingerprints are stored on and retrieved from a server by a client to identify duplicate data present on a server during the backup process on an “as needed” basis. The s…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06F11/1469. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).