Stream locality delta compression
US-2015261779-A1 · Sep 17, 2015 · US
US9690802B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9690802-B2 |
| Application number | US-201514723196-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 27, 2015 |
| Priority date | Nov 14, 2008 |
| Publication date | Jun 27, 2017 |
| Grant date | Jun 27, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Stream locality delta compression is disclosed. A previous stream indicated locale of data segments is selected. A first data segment is then determined to be similar to a data segment in the stream indicated locale.
Opening claim text (preview).
What is claimed is: 1. A system for processing data, comprising: a storage system module configured to store a plurality of data segments having identifiable segment boundaries, and to receive one or more of a data stream and a data block to be processed for storage; a deduplication module configured to use one or more processors to determine a stream indicated locale, the locale including a selection of a plurality of previously stored data segments satisfying a locality criterion with respect to a portion of the one or more of the data stream and the data block being processed for storage; and a delta compression module configured to use one or more processors to determine, based at least in part on the selection of the plurality of stored data segments satisfying the locality criterion, that a first data segment sketch for a first data segment included in the portion of the one or more of the data stream and the data block being processed for storage is similar to one or more data segment sketches amongst data segment sketches of a limited sketch index corresponding to the data segments in the determined locale, wherein when it is determined the first data segment sketch is not similar to the one or more data segment sketches amongst data segment sketches of the limited sketch index, the delta compression module is configured to determine whether the first data segment sketch is similar to one or more data segment sketches amongst data segment sketches of a master sketch index. 2. The system of claim 1 , wherein the identifiable segment boundaries of the plurality of data segments correspond to one or more of content-based segment boundaries, fixed-length segment boundaries, variable length segment boundaries, overlapping segment boundaries, non-overlapping segment boundaries. 3. The system of claim 1 , wherein the deduplication module segments the one or more of the data stream and the data block into a plurality of data segments. 4. The system of claim 1 , wherein the delta compression module determines the stream indicated locale based at least in part by selecting a set of data segments received or stored in proximity to the previously stored data segments. 5. The system of claim 1 , wherein the delta compression modules is further configured to compute an encoding of the first data segment. 6. The system of claim 5 , wherein the delta compression module is further configured to store the encoding of the first data segment. 7. The system of claim 5 , wherein the delta compression module is further configured to transmit the encoding of the first data segment. 8. The system of claim 5 , wherein the encoding of the first data segment is based at least in part on the data segment in the locale. 9. The system of claim 5 , wherein the encoding of the first data segment comprises an indication of a set of data blocks in the first data segment not present in the data segment in the locale and an indication of a set of data blocks in the data segment in the locale. 10. The system of claim 5 , wherein the delta compression module is further configured to determine whether the encoding is smaller than the first data segment. 11. The system of claim 1 , wherein determining that the first data segment sketch for the first data segment is similar to one or more data segment sketches amongst data segment sketches of the limited sketch index is based on a sketch function that comprises one or more functions that can return a similar value for similar data segments. 12. The system of claim 11 , wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, and nearest-neighbor-search. 13. The system of claim 1 , wherein the first data segment is similar to one or more other data segments in the previous stream indicated locale in addition to the data segment in the previous stream indicated locale. 14. The system of claim 13 , wherein the delta compression module is further configured to compute an encoding of the first data segment. 15. The system of claim 14 , wherein the encoding is based at least in part on the data segment in the previous stream indicated locale and the one or more other data segments. 16. The system of claim 13 , wherein the one or more other data segments and the data segment in the previous stream indicated locale are identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, and frequency of selection for other compressed segments. 17. The system of claim 1 , wherein the data segment sketches of the limited sketch index are stored in a cache in response to determining previously stored data segments satisfy the locality criterion with respect to a portion of the one or more of the data stream and the data block. 18. A method for processing data, comprising: using one or more processors to store a plurality of data segments having identifiable segment boundaries; using the one or more processors to receive one or more of a data stream and a data block to be processed for storage; using the one or more processors to determine a stream indicated locale, the locale including a selection of a plurality of previously stored data segments satisfying a locality criterion with respect to a portion of the one or more of the data stream and the data block being processed for storage; and using the one or more processors to determine, based at least in part on the selection of the plurality of stored data segments satisfying the locality criterion, that a first data segment sketch for a first data segment included in the portion of the one or more of the data stream the data block being processed for storage is similar to one or more data segment sketches amongst data segment sketches of a limited sketch index corresponding to the data segments in the determined locale, wherein when it is determined the first data segment sketch is not similar to the one or more data segment sketches amongst data segment sketches of the limited sketch index, the one or more processors determine whether the first data segment sketch is similar to one or more data segment sketches amongst data segment sketches of a master sketch index. 19. The method of claim 18 , wherein the determining of the stream indicated locale comprises selecting a set of data segments received or stored in proximity to the previously stored data segment. 20. The method of claim 18 , further comprising encoding the first segment based at least in part on the data segment in the locale. 21. The method of claim 20 , wherein the encoding of the first data segment comprises an indication of a set of data blocks in the first data segment not present in the data segment in the locale and an indication of a set of data blocks in the data segment in the locale. 22. The method of claim 20 , further comprising determining whether the encoding of the first data segment is smaller than the first data segment. 23. The method of claim 18 , wherein determining that the first data segment sketch for first data segment is similar to the one or more data segment sketches amongst the data segments sketches of the limited sketch index is based on a sketch function that comprises one or more functions that can return a similar value for similar data segments. 24. A computer program product for processing data, the com
based on delta files · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
using compression, e.g. sparse files · CPC title
Indexing; Web crawling techniques · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.