Stream-based data deduplication with cache synchronization

US9451000B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9451000-B2
Application numberUS-201314140102-A
CountryUS
Kind codeB2
Filing dateDec 24, 2013
Priority dateDec 27, 2012
Publication dateSep 20, 2016
Grant dateSep 20, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chunks in cache that it needs, it follows a conventional content delivery network procedure to retrieve them. In this way, if dictionaries between pairs of sending and receiving peers are out-of-sync, relevant sections are then re-synchronized on-demand. The approach does not require that libraries maintained at a particular pair of sender and receiving peers are the same. Rather, the technique enables a peer, in effect, to “backfill” its dictionary on-the-fly. On-the-wire compression techniques are provided to reduce the amount of data transmitted between the peers.

First claim

Opening claim text (preview).

What is claimed is as follows: 1. A non-transitory computer-readable medium having stored thereon instructions that, when executed on a parent data processing node and a child data processing node, carry out the following operations, the parent and child data processing nodes having respective libraries: as a stream traverses the parent data processing node, breaking the stream into chunks; for a particular chunk of the stream, determining, at the parent data processing node and using a directed cyclic graph (DCG) associated with the parent data processing node, a likelihood that the child data processing node already has the chunk, wherein the directed cyclic graph comprises one or more nodes, wherein a node represents a block of data and has associated therewith a label denoting a fingerprint associated with the block of data, and wherein a given node in the directed cyclic graph has a state selected from a set of states; based on the determination, sending, from the parent data processing node to the child processing node, one of: the chunk, a reference to the chunk, and a compact data structure representing a sequence of chunks as identified in the directed cyclic graph associated with the parent data processing node; as the stream begins to be decoded at the child data processing node, and for at least one reference or compact data structure in the stream, determining whether the reference or the compact data structure, as the case may be, is associated with a chunk stored in the library associated with the child data processing system, wherein the determining step uses a directed cyclic graph (DCG) associated with the child data processing node; and when the reference or compact data structure is associated with a chunk stored in the library, incorporating data associated with the chunk back into the stream; and when the reference is associated with a chunk or the compact data structure that is missing, performing an on-demand request to the parent data processing node to obtain data corresponding to the chunk or the compact data structure. 2. The computer-readable medium as described in claim 1 wherein the on-demand request is a request for chunks associated with a missing compact data structure. 3. The computer-readable medium as described in claim 1 wherein the set of states include a terminal state, an intermediate state, and an overflow state, wherein a terminal state denotes that the node has a degree out equal to zero, wherein an intermediate state denotes that the node has a degree out equal to one, and wherein an overflow state denotes that the node has a degree out greater than one. 4. The computer-readable medium as described in claim 3 wherein the compact data structure is a strand comprising a sequence of intermediate nodes that are connected together. 5. The computer-readable medium as described in claim 1 wherein the DCG associated with the child data processing node is updated following receipt of the chunks identified in the on-demand request to maintain synchronization between caches associated with the respective parent data and child data processing nodes. 6. A method operative across a parent data processing node and a child data processing node, the parent and child data processing nodes having respective libraries, comprising: as a stream traverses the parent data processing node, breaking the stream into chunks; for a particular chunk of the stream, determining, at the parent data processing node and using a directed cyclic graph (DCG) associated with the parent data processing node, a likelihood that the child data processing node already has the chunk, wherein the directed cyclic graph comprises one or more nodes, wherein a node represents a block of data and has associated therewith a label denoting a fingerprint associated with the block of data, and wherein a given node in the directed cyclic graph has a state selected from a set of states; based on the determination, sending, from the parent data processing node to the child processing node, one of: the chunk, a reference to the chunk, and a compact data structure representing a sequence of chunks as identified in the directed cyclic graph associated with the parent data processing node; as the stream begins to be decoded at the child data processing node, and for at least one reference or compact data structure in the stream, determining whether the reference or the compact data structure, as the case may be, is associated with a chunk stored in the library associated with the child data processing system, wherein the determining step uses a directed cyclic graph (DCG) associated with the child data processing node; and when the reference or compact data structure is associated with a chunk stored in the library, incorporating data associated with the chunk back into the stream; and when the reference is associated with a chunk or the compact data structure that is missing, performing an on-demand request to the parent data processing node to obtain data corresponding to the chunk or the compact data structure; wherein at least one of the steps is carried out in software executing in a hardware element. 7. The method as described in claim 6 wherein the on-demand request is a request for chunks associated with a missing compact data structure. 8. The method as described in claim 6 wherein the set of states include a terminal state, an intermediate state, and an overflow state, wherein a terminal state denotes that the node has a degree out equal to zero, wherein an intermediate state denotes that the node has a degree out equal to one, and wherein an overflow state denotes that the node has a degree out greater than one. 9. The method as described in claim 8 wherein the compact data structure is a strand comprising a sequence of intermediate nodes that are connected together. 10. The method as described in claim 7 wherein the DCG associated with the child data processing node is updated following receipt of the chunks identified in the on-demand request to maintain synchronization between caches associated with the respective parent data and child data processing nodes.

Assignees

Inventors

Classifications

  • H04L65/60Primary

    Network streaming of media packets · CPC title

  • specially adapted for file transfer, e.g. file transfer protocol [FTP] · CPC title

  • Electricity · mapped topic

  • Pre-fetching or pre-delivering data based on network characteristics · CPC title

  • Reducing the amount or size of exchanged application data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9451000B2 cover?
Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chu…
Who is the assignee on this patent?
Akamai Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L65/60. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 20 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).