Reducing bandwidth during synthetic restores from a deduplication file system
US-2023376385-A1 · Nov 23, 2023 · US
US12174708B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12174708-B2 |
| Application number | US-202217873769-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 26, 2022 |
| Priority date | Jul 26, 2022 |
| Publication date | Dec 24, 2024 |
| Grant date | Dec 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of a targeted deduplication process that splits protected data into variable size segments, generates a fingerprint for each segment, and then combines fingerprints into groups to form group fingerprints. An embodiment auto-generates and persists the group fingerprints for the backups which are already on the storage server, thus enabling the backup client to fetch these fingerprints using an identifier and enforce synthesis for the new backup or replication copy against any previously written backup. For this embodiment, group fingerprints are generated on the storage server itself, rather than being generated on and pushed from the backup client for mere storage on the storage server, so that, as files are ingested, the storage server also auto-generates group fingerprints on its own.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising: sending, from the backup client to the storage server, an insight about an old backup for use in deduplicating a new backup and a request for a group fingerprint set; first generating, using the insight on the storage server, group fingerprints for the old backup; fetching the group fingerprints from the storage server to the backup client; second generating, on the backup client, group fingerprints of the new backup; comparing, on the backup client, the new backup group fingerprints to the old backup group fingerprints; sending, for any matching group fingerprints in the comparing step, from the backup client to the storage server, a virtual synthetic copy request; and creating the new backup in response to the virtual synthetic copy request, and wherein the deduplicated backup system comprises a Data Domain file system (DDFS) and a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for data ingests, and which translates application read and write request to DDBoost application program interfaces (APIs). 2. The method of claim 1 wherein protected data comprises a plurality of data segments, and the group fingerprints comprise a set of fingerprints, and wherein each fingerprint comprising a signature for a respective data segment of the plurality of data segments. 3. The method of claim 2 wherein the signature for each respective data segment is generated using a cryptographic hash function, and wherein the fingerprints are stored in a L0 to L6 layered directory tree. 4. The method of claim 2 wherein the fingerprints of each group fingerprints are grouped using a defined grouping algorithm, and wherein the insight comprises at least one of: a backup identifier, a filename and path of the previous backup, or other identifying information about the previous backup. 5. The method of claim 1 further comprising, if there is no match in the comparing step, copying data for the group fingerprints of the new backup from the backup client to the storage server, performing using per-segment deduplication in the backup client. 6. The method of claim 5 wherein the backup identifier comprises at least one of: a job number, backup location information, a filename and path of a previous backup, or other identifying information about one or more previous backups. 7. The method of claim 1 wherein each of the first generating step and second generating step further comprises: dividing the backup data into variable size segments; generating a fingerprint for each segment; combining generated fingerprints to form the group fingerprints; and storing the group fingerprints, segments, and fingerprints in one of the storage server and the backup client. 8. The method of claim 1 wherein the matching group fingerprints and the insight enable virtual synthetic backups for an application that does not have sufficient knowledge of change data blocks from the previous backup to use virtual synthetic backup operations by itself. 9. The method of claim 1 further comprising: adding, in the storage server, additional insights comprising information not provided but deducible by the storage server, to the client-sent insights to form combined insights; and re-generating, in the storage server any group fingerprints that do not exist for matching backups based on the combined insights. 10. The method of claim 9 further comprising, after the step of creating the new backup in response to the virtual synthetic copy request, releasing by the backup client, the requested group fingerprint handle, and unlocking, by the storage server, the matching backups. 11. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising a server-side process of: receiving, from the backup client to the storage server, an insight about an old backup and a request for a group fingerprint set; adding, in the storage server, an additional insight to the insight received from the backup client to form combined insights; locating, in the storage server, backups that maximize client-side deduplication for virtual synthetic backups; re-generating, in the storage server, any group fingerprints that do not exist for a matching set of backups based on the combined insights; sending, from the storage server to the backup client, a handle for the re-generated group fingerprints; locking the matching set of backups in the storage server, and further comprising a client-side process of: sending the insight request for a group fingerprint set to the storage server; fetching, in the backup client in response to the handle sent from the storage server, group fingerprints comprising the generated and re-generated group fingerprints; generating, in the backup client, new backup group fingerprints; comparing the new backup group fingerprints to the group fingerprints fetched from the storage server to find any matching group fingerprints; generating virtual synthetic copy instructions for data corresponding to any matching group fingerprints; performing, on the backup client, segment-level deduplication for data corresponding to any non-matching group fingerprints; generating, from the backup client, a backup operation that uses the virtual synthetic copy instructions; committing the backup operation to the storage server; and releasing, by the backup client, the group fingerprint handle allowing the storage server to release the group fingerprints generated by the storage server. 12. The method of claim 11 further comprising unlocking, by the storage server, the backups corresponding to the released group fingerprints. 13. The method of claim 11 wherein the fingerprints of each group fingerprints are grouped using a defined grouping algorithm, and wherein the insight comprises at least one of: a backup identifier, a filename and path of the previous backup, or other identifying information about the previous backup. 14. The method of claim 12 wherein the additional insight comprises any information not provided by the backup client, but that can be deduced by the storage server itself. 15. A computer-implemented method of making virtual synthetic backups of protected data from a backup client for storage in a storage server of a deduplicated backup system, comprising a client-side process of: sending, from the backup client to the storage server, an insight about an old backup and a request for a group fingerprint set; fetching, in the backup client in response to a handle sent from the storage server, group fingerprints comprising the generated and re-generated group fingerprints; generating, in the backup client, new backup group fingerprints; comparing the new backup group fingerprints to the group fingerprints fetched from the storage server to find any matching group fingerprints; generating virtual synthetic copy instructions for data corresponding to any matching group fingerprints; and performing, on the backup client, segment-level deduplication for data corresponding to any non-matching group fingerprints, and wherein the deduplicated backup system comprises a Data Domain file system (DDFS) and a Data Domain Bandwidth Optimized Open Storage Technology (DDBoost) library that links with the application to reduce bandwidth required for
for networked environments · CPC title
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.