System and method for data deduplication of backup images
US-9098432-B1 · Aug 4, 2015 · US
US9384207B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9384207-B2 |
| Application number | US-201514627880-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 20, 2015 |
| Priority date | Nov 16, 2010 |
| Publication date | Jul 5, 2016 |
| Grant date | Jul 5, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments. The method also includes determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store.
Opening claim text (preview).
We claim: 1. A computing system for storing deduplicated images of a data object that changes over time in a deduplicating content store, the deduplicating content store having a local cache and a global cache, the computing system comprising: a processor; and a memory coupled to the processor and including computer-readable instructions that, when executed by the processor, cause the processor to: organize the content of the data object for a first temporal state of the data object as a plurality of content segments and storing the plurality of content segments in a data store; create a content structure representing content of the data object as a hierarchical arrangement of hash structures in the data store, wherein each hash structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and wherein a higher-level hash structure in the hierarchical arrangement aggregates a set of lower-level hash structures, such that a logical organization of the content structure represents the organization of the content segments as they are represented within the data object; receive difference information for the data object, said difference information indicating changed content for the data object for a second temporal state of the data object relative to the first temporal state, and said difference information indicating a location of the changed content within the data object; receive the changed content for the data object at the deduplicating content store; form a hash signature for each of a set of changed lower-level hash structures associated with the changed content; form a hash signature for a changed higher-level hash structure aggregating a plurality of the set of changed lower-level hash structures; determine, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store before attempting to search for the hash signatures for each of the set of changed lower-level hash structures; store any changed content that is unique in the data store as content segments; modify the organized arrangement of hash structures to incorporate new structures for the content segment corresponding to at least one hash signature for the changed content; and incorporate the new structures in the organized arrangement of structures at a position corresponding to the location of the changed content within the data object as indicated within said difference information, thereby using the higher-level hash signature for the changed content without unnecessary searching for hash signatures for the lower-level hash structures. 2. The computing system of claim 1 , wherein the processor is further caused to determine whether the changed content should be stored without checking the local cache of the deduplicating content store. 3. The computing system of claim 1 , further comprising, if the hash signature for the changed higher-level hash structure is found within the global cache of the deduplicating content store, the processor is further caused to ignore hash signatures corresponding to the changed content, and not store content segments corresponding to the ignored hash signatures. 4. The computing system of claim 1 , further comprising, if the hash signature for the changed higher-level hash structure is not found within the global cache of the deduplicating content store, the processor is further caused to search for any hash signatures in the local cache of the deduplicating content store corresponding to the changed content, and not store content segments corresponding to the hash signatures found in the local cache of the deduplicating content store. 5. The computing system of claim 1 , wherein the processor is further caused to update any content structures in the organized arrangement of hash structures that reference the content structure corresponding to the changed higher-level hash structure. 6. A non-transitory computer readable medium having executable instructions operable to cause an apparatus to: organize the content of the data object for a first temporal state of the data object as a plurality of content segments and storing the plurality of content segments in a data store; create a content structure representing content of the data object as a hierarchical arrangement of hash structures in the data store, wherein each hash structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and wherein a higher-level hash structure in the hierarchical arrangement aggregates a set of lower-level hash structures, such that a logical organization of the content structure represents the organization of the content segments as they are represented within the data object; receive difference information for the data object, said difference information indicating changed content for the data object for a second temporal state of the data object relative to the first temporal state, and said difference information indicating a location of the changed content within the data object; receive the changed content for the data object at the deduplicating content store; form a hash signature for each of a set of changed lower-level hash structures associated with the changed content; form a hash signature for a changed higher-level hash structure aggregating a plurality of the set of changed lower-level hash structures; determine, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store before attempting to search for the hash signatures for each of the set of changed lower-level hash structures; store any changed content that is unique in the data store as content segments; modify the organized arrangement of hash structures to incorporate new structures for the content segment corresponding to at least one hash signature for the changed content; and incorporate the new structures in the organized arrangement of structures at a position corresponding to the location of the changed content within the data object as indicated within said difference information, thereby using the higher-level hash signature for the changed content without unnecessary searching for hash signatures for the lower-level hash structures. 7. The non-transitory computer readable medium of claim 6 , the instructions being further operable to determine whether the changed content should be stored without checking the local cache of the deduplicating content store. 8. The non-transitory computer readable medium of claim 6 , further comprising, if the hash signature for the changed higher-level hash structure is found within the global cache of the deduplicating content store, the instructions being further operable to ignore hash signatures corresponding to the changed content, and not store content segments corresponding to the ignored hash signatures. 9. The non-transitory computer readable medium of claim 6 , further comprising, if the hash signature for the changed higher-level hash structure is not found within the global cache of the deduplicating content store, the instructions being further operable to search for any hash signatures in the local cache of the deduplicating content store corresponding to the changed content, and not store content segments corresponding to the hash signatures found in the
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Hash tables · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.