File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US9766983B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9766983-B2 |
| Application number | US-13051408-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 30, 2008 |
| Priority date | Mar 5, 2008 |
| Publication date | Sep 19, 2017 |
| Grant date | Sep 19, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer implemented method and system obtains current signatures of data chunks and performs a proximity search of a library of previous signatures as a function of the likely location of corresponding data chunks. A full search of the library of previous signatures for those current signatures not found in the proximity search is also performed.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: identifying a first plurality of signatures of data chunks in a first data backup of a plurality of data backups, wherein identifying the first plurality of signatures is based, at least in part, on the first data backup being a most recent of the plurality of data backups; performing a first search of the first plurality of signatures for each of a second plurality of signatures of data chunks for a second data backup, wherein the plurality of data backups precedes the second data backup, for each of the second plurality of signatures found in the first plurality of signatures, indicating that a corresponding data chunk is not to be stored for the second data backup; and after the first search, performing a second search of at least a third plurality of signatures of older data backups of the plurality of data backups, wherein performing the second search comprises, creating and updating an in-memory map with progressively older subsets of signatures of the third plurality of signatures that are incrementally loaded from a backup domain into memory, for each progressively older subset of signatures in the in-memory map, searching the in-memory map for the second plurality of signatures, and for each of the second plurality of signatures found in the in-memory map, removing a duplicate of a corresponding data chunk from either the second data backup or the older data backups. 2. The method of claim 1 wherein the second data backup corresponds to a volume on a machine. 3. The method of claim 1 , wherein indicating, for each of the second plurality of signatures found in the in-memory map, that the corresponding data chunk is not to be stored for the second backup comprises removing the signature found in the in-memory map from a list of the second plurality of signatures. 4. The method of claim 1 , wherein creating and updating the in-memory map comprises: after each searching of the in-memory map, removing a currently loaded one of the progressively older subsets of signatures from the in-memory map and inserting into the in-memory map a next most recent one of the progressively older subsets of signatures. 5. The method of claim 1 further comprising: in response to not finding a signature of the second plurality of signatures in the in-memory map, inserting into the in-memory map the signature that was not found; and in response to determining that a number of entries of the in-memory map reaches a threshold based on inserting the signature that was not found, removing the oldest signature from the in-memory map. 6. The method of claim 1 , wherein identifying the first plurality of signatures comprises: determining a source, destination, and backup method for the second data backup; identifying the first data backup as the last successful or incomplete backup from the source to the destination with the backup method; and accessing a file for the first data backup that includes the first plurality of signatures. 7. The method of claim 1 , wherein performing the second search is after completion of the second data backup. 8. The method of claim 1 wherein the in-memory map is a btree. 9. A method comprising: identifying a first plurality of signatures of data chunks of a first data backup of a plurality of data backups; loading a first subset of the first plurality of signatures from a backup domain into memory; creating an in-memory map with the first subset of signatures; searching the in-memory map for each of a second plurality of signatures of data chunks of a second data backup, wherein the first data backup precedes the second data backup and the first subset of signatures comprises most recent signatures of the first plurality of signatures; incrementally loading progressively older subsets of the first plurality of signatures from a backup domain into the memory; after each loading, removing at least one of the signatures from the in-memory map in order from oldest to youngest, wherein a number of the signatures removed from the in-memory map is based on a size boundary defined for the in-memory map; inserting currently loaded signatures from the memory into the in-memory map; for each of the second plurality of signatures found in the in-memory map, indicating that a corresponding data chunk is not to be stored for the second data backup; in response to not finding a signature of the second plurality of signatures in the in-memory map, inserting into the in-memory map the signature that was not found; and in response to determining that a number of entries of the in-memory map satisfies a threshold based on inserting the signature that was not found, removing the oldest of the signatures in the in-memory map. 10. The method of claim 9 wherein the second data backup corresponds to a volume on a machine. 11. The method of claim 9 wherein indicating, for each of the second plurality of signatures found in the in-memory map, that the corresponding data chunk is not to be stored for the second data backup comprises removing the signature found in the in-memory map from a list of the second plurality of signatures. 12. The method of claim 9 wherein the in-memory map is a btree. 13. A non-transitory computer readable storage medium comprising instructions for narrow duplicate searching of previous data backups, the instructions to: identify a first plurality of signatures of data chunks in a first data backup of a plurality of previous data backups, wherein identifying the first plurality of signatures is based, at least in part, on the first data backup being a most recent of the plurality of previous data backups; perform a first search of the first plurality of signatures for each of a second plurality of signatures of data chunks for a second data backup, wherein the first data backup precedes the second data backup and is a most recent of the plurality of previous data backups, for each of the second plurality of signatures found in the first plurality of signatures, indicate that a corresponding data chunk is not to be stored for the second data backup; and perform a second search of at least a third plurality of signatures of a third backup of the plurality of previous data backups that precedes the first backup, wherein the instructions to perform the second search comprise instructions to load a first subset of signatures of the third plurality of signatures from a backup domain into memory to create an in-memory map, search the in-memory map for the second plurality of signatures, and for each of the second plurality of signatures found in the in-memory map, indicate that a corresponding data chunk is not to be stored for the second data backup. 14. The non-transitory computer readable storage medium of claim 13 further comprising instructions to: load the subset of signatures from a backup domain into memory prior to searching. 15. The non-transitory computer readable storage medium of claim 13 , wherein the instructions to identify the first plurality of signatures comprise instructions to: determine a source, a destination, and a backup method for the second data backup; identify the first data backup as the last successful or incomplete backup from the source to the destination with the backup method; and access a file for the first data backup that includes the first plurality of signatures. 16. The non-transitory computer readable storage medium of claim 13 , wherein the instructions to perform the second search comprise instructions to: perform the second search after
using compression, e.g. sparse files · CPC title
using de-duplication of the data · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.