Data replication with delta compression

US9418133B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9418133-B2
Application numberUS-201414466745-A
CountryUS
Kind codeB2
Filing dateAug 22, 2014
Priority dateNov 14, 2008
Publication dateAug 16, 2016
Grant dateAug 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for processing data, comprising: one or more processors configured to: store a first data stream or a first data block in a primary system using a first plurality of segments, wherein the first plurality of segments includes a first data segment; select a second data segment on the primary system for replication; determine that the first data segment is similar to the second data segment using a sketch function; cause storing of a second data stream or a second data block in a replica system using a second plurality of segments, wherein the second plurality of segments includes a first data segment copy of the first data segment; determine an encoding of the second data segment, wherein the determining of the encoding of the second data segment comprises determining a difference between the first data segment and the second data segment, wherein the encoded second data segment comprises the determined difference between the first data segment and the second data segment and a reference to the first data segment copy of the first data segment; compare a size of the encoding of the second data segment with an original size of the second data segment; in the event that a difference in the size of the encoding of the second data segment and the original size of the second data segment is greater than or equal to a threshold, transmit the encoding of the second data segment to the replica system from the primary system, wherein the encoding of the second data segment is decoded for storage in the replica system; and in the event that the difference in the size of the encoding of the second data segment and the original size of the second data segment is less than a threshold, transmit the second data segment to the replica system for storage from the primary system; and one or more memories coupled to the one or more processors and configured to provide the one or more processors with instructions. 2. The system as in claim 1 , wherein the encoding of the second data segment is compressed prior to transmitting. 3. The system as in claim 2 , wherein the encoding of the second data segment comprises an indication of a set of data blocks in the second data segment not present in the first data segment and an indication of a set of data blocks present in both data segments. 4. The system as in claim 1 , wherein the replica system decodes the encoding of the second data segment. 5. The system as in claim 1 , wherein the replica system stores the encoding of the second data segment. 6. The system as in claim 1 , wherein the replica system stores a decoding of the encoding of the second data segment. 7. The system as in claim 1 , wherein the sketch function comprises a hash function. 8. The system as in claim 1 , wherein the sketch function comprises a plurality of hash functions. 9. The system as in claim 1 , wherein the sketch function comprises one or more functions that return a same value for similar data segments. 10. The system as in claim 1 , wherein the sketch function comprises one or more functions that return a similar value for similar data segments. 11. The system as in claim 1 , wherein the sketch function comprises one or more functions that may return a same value for similar data segments. 12. The system as in claim 1 , wherein the sketch function comprises one or more functions that may return a similar value for similar data segments. 13. The system as in claim 12 , wherein sketch function values are determined to be similar based on one or more of the following methods: numeric difference, hamming distance, locality-sensitive-hashing, or nearest-neighbor-search. 14. The system as in claim 1 , wherein the first data segment is identified based at least in part on one or more of the following: temporal locality, spatial locality, ease of access, expected compression, or frequency of selection for other compressed segments. 15. The system as in claim 1 , wherein the second data segment is similar to one or more data segments on both the primary and replica systems in addition to the first data segment. 16. The system as in claim 15 , wherein the encoding of the second data segment is based at least in part on the first data segment and the one or more additional similar data segments. 17. The system as in claim 1 , wherein the second data segment was stored as an encoding of a third data segment. 18. A method for processing data comprising: storing a first data stream or a first data block in a primary system using a first plurality of segments, wherein the first plurality of segments includes a first data segment; selecting a second data segment on the primary system for replication; determining that the first data segment is similar to the second data segment using a sketch function; causing storing of a second data stream or a second data block in a replica system using a second plurality of segments, wherein the second plurality of segments includes a first data segment copy of the first data segment; determining, using a processor, an encoding of the second data segment, wherein the determining of the encoding of the second data segment comprises determining a difference between the first data segment and the second data segment, wherein the encoded second data segment comprises the determined difference between the first data segment and the second data segment and a reference to the first data segment copy of the first data segment; comparing a size of the encoding of the second data segment with an original size of the second data segment; in the event that a difference in the size of the encoding of the second data segment and the original size of the second data segment is greater than or equal to a threshold, transmitting the encoding of the second data segment to the replica system for storage from the primary system, wherein the encoding of the second data segment is decoded for storage in the replica system; and in the event that the difference in the size of the encoding of the second data segment and the original size of the second data segment is less than a threshold, transmitting the second data segment to the replica system for storage from the primary system. 19. A computer program product for processing data, the computer program product being embodied in a non-transitory computer readable storage medium and comprising computer instructions for: storing a first data stream or a first data block in a primary system using a first plurality of segments, wherein the first plurality of segments includes a first data segment; selecting a second data segment on the primary system for replication; determining that the first data segment is similar to the second data segment using a sketch function; causing storing of a second data stream or a second data block in a replica system using a second plurality of segments, wherein the second plurality of segments includes a first data segment copy of the first data segment; determining, using a processor, an encoding of the second data segment, wherein the determining of the encoding of the second data segment comprises determining a difference between the first data segment and the second data segment, wherein the encoded second data segment comprises the determined difference between the first data segment and the second data segment and a reference to the first data segment copy of the first data segment; comparing a size of the encoding of the second data segment with an original size of the second data segment; in

Assignees

Inventors

Classifications

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • based on delta files · CPC title

  • for networked environments · CPC title

  • by selection of backup contents · CPC title

  • implemented as replicated file system · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9418133B2 cover?
Data replication with delta compression is disclosed. A primary system and a replica system are determined to both have an identical first data segment that is similar to a second data segment. The second data segment is encoded, wherein the encoding refers to the first data segment.
Who is the assignee on this patent?
Emc Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/1756. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).