Stream locality delta compression

US9690802B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9690802-B2
Application numberUS-201514723196-A
CountryUS
Kind codeB2
Filing dateMay 27, 2015
Priority dateNov 14, 2008
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Stream locality delta compression is disclosed. A previous stream indicated locale of data segments is selected. A first data segment is then determined to be similar to a data segment in the stream indicated locale.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for processing data, comprising: a storage system module configured to store a plurality of data segments having identifiable segment boundaries, and to receive one or more of a data stream and a data block to be processed for storage; a deduplication module configured to use one or more processors to determine a stream indicated locale, the locale including a selection of a plurality of previously stored data segments satisfying a locality criterion with respect to a portion of the one or more of the data stream and the data block being processed for storage; and a delta compression module configured to use one or more processors to determine, based at least in part on the selection of the plurality of stored data segments satisfying the locality criterion, that a first data segment sketch for a first data segment included in the portion of the one or more of the data stream and the data block being processed for storage is similar to one or more data segment sketches amongst data segment sketches of a limited sketch index corresponding to the data segments in the determined locale, wherein when it is determined the first data segment sketch is not similar to the one or more data segment sketches amongst data segment sketches of the limited sketch index, the delta compression module is configured to determine whether the first data segment sketch is similar to one or more data segment sketches amongst data segment sketches of a master sketch index. 2. The system of claim 1 , wherein the identifiable segment boundaries of the plurality of data segments correspond to one or more of content-based segment boundaries, fixed-length segment boundaries, variable length segment boundaries, overlapping segment boundaries, non-overlapping segment boundaries. 3. The system of claim 1 , wherein the deduplication module segments the one or more of the data stream and the data block into a plurality of data segments. 4. The system of claim 1 , wherein the delta compression module determines the stream indicated locale based at least in part by selecting a set of data segments received or stored in proximity to the previously stored data segments. 5. The system of claim 1 , wherein the delta compression modules is further configured to compute an encoding of the first data segment. 6. The system of claim 5 , wherein the delta compression module is further configured to store the encoding of the first data segment. 7. The system of claim 5 , wherein the delta compression module is further configured to transmit the encoding of the first data segment. 8. The system of claim 5 , wherein the encoding of the first data segment is based at least in part on the data segment in the locale. 9. The system of claim 5 , wherein the encoding of the first data segment comprises an indication of a set of data blocks in the first data segment not present in the data segment in the locale and an indication of a set of data blocks in the data segment in the locale. 10. The system of claim 5 , wherein the delta compression module is further configured to determine whether the encoding is smaller than the first data segment. 11. The system of claim 1 , wherein determining that the first data segment sketch for the first data segment is similar to one or more data segment sketches amongst data segment sketches of the limited sketch index is based on a sketch function that comprises one or more functions that can return a similar value for similar data segments. 12. The system of claim 11 , wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, and nearest-neighbor-search. 13. The system of claim 1 , wherein the first data segment is similar to one or more other data segments in the previous stream indicated locale in addition to the data segment in the previous stream indicated locale. 14. The system of claim 13 , wherein the delta compression module is further configured to compute an encoding of the first data segment. 15. The system of claim 14 , wherein the encoding is based at least in part on the data segment in the previous stream indicated locale and the one or more other data segments. 16. The system of claim 13 , wherein the one or more other data segments and the data segment in the previous stream indicated locale are identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, and frequency of selection for other compressed segments. 17. The system of claim 1 , wherein the data segment sketches of the limited sketch index are stored in a cache in response to determining previously stored data segments satisfy the locality criterion with respect to a portion of the one or more of the data stream and the data block. 18. A method for processing data, comprising: using one or more processors to store a plurality of data segments having identifiable segment boundaries; using the one or more processors to receive one or more of a data stream and a data block to be processed for storage; using the one or more processors to determine a stream indicated locale, the locale including a selection of a plurality of previously stored data segments satisfying a locality criterion with respect to a portion of the one or more of the data stream and the data block being processed for storage; and using the one or more processors to determine, based at least in part on the selection of the plurality of stored data segments satisfying the locality criterion, that a first data segment sketch for a first data segment included in the portion of the one or more of the data stream the data block being processed for storage is similar to one or more data segment sketches amongst data segment sketches of a limited sketch index corresponding to the data segments in the determined locale, wherein when it is determined the first data segment sketch is not similar to the one or more data segment sketches amongst data segment sketches of the limited sketch index, the one or more processors determine whether the first data segment sketch is similar to one or more data segment sketches amongst data segment sketches of a master sketch index. 19. The method of claim 18 , wherein the determining of the stream indicated locale comprises selecting a set of data segments received or stored in proximity to the previously stored data segment. 20. The method of claim 18 , further comprising encoding the first segment based at least in part on the data segment in the locale. 21. The method of claim 20 , wherein the encoding of the first data segment comprises an indication of a set of data blocks in the first data segment not present in the data segment in the locale and an indication of a set of data blocks in the data segment in the locale. 22. The method of claim 20 , further comprising determining whether the encoding of the first data segment is smaller than the first data segment. 23. The method of claim 18 , wherein determining that the first data segment sketch for first data segment is similar to the one or more data segment sketches amongst the data segments sketches of the limited sketch index is based on a sketch function that comprises one or more functions that can return a similar value for similar data segments. 24. A computer program product for processing data, the com

Assignees

Inventors

Classifications

  • based on delta files · CPC title

  • De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title

  • using compression, e.g. sparse files · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9690802B2 cover?
Stream locality delta compression is disclosed. A previous stream indicated locale of data segments is selected. A first data segment is then determined to be similar to a data segment in the stream indicated locale.
Who is the assignee on this patent?
Emc Corp, Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/1756. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).