Stream-based data deduplication with peer node prediction

US9420058B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9420058-B2
Application numberUS-201314140116-A
CountryUS
Kind codeB2
Filing dateDec 24, 2013
Priority dateDec 27, 2012
Publication dateAug 16, 2016
Grant dateAug 16, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chunks in cache that it needs, it follows a conventional content delivery network procedure to retrieve them. In this way, if dictionaries between pairs of sending and receiving peers are out-of-sync, relevant sections are then re-synchronized on-demand. The approach does not require that libraries maintained at a particular pair of sender and receiving peers are the same. Rather, the technique enables a peer, in effect, to “backfill” its dictionary on-the-fly. On-the-wire compression techniques are provided to reduce the amount of data transmitted between the peers.

First claim

Opening claim text (preview).

What is claimed is as follows: 1. A method operative in an overlay network comprising a sending peer and a receiving peer, wherein the sending and receiving peers provide stream-based data deduplication by examining data that flows through the sending peer and receiving peer and replacing blocks of the data with references that point into data dictionaries associated with each of the peers, the method comprising: maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph represents temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to the stream-based data deduplication; in response to receipt at the receiving peer of a request for a page, the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of the page; upon receipt of the hinting request at the sending peer, the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page; and returning the hinting response to the receiving peer to facilitate a pre-warming operation at the receiving peer during the stream-based data deduplication of the page; wherein the hinting request and the hinting response are generated in software executing in a hardware element. 2. The method as described in claim 1 wherein the sending peer returns the hinting response to the receiving peer while going forward to an origin to fetch the page. 3. The method as described in claim 2 wherein the receiving peer uses information in the hinting response to pre-warm an associated cache with the blocks of data that the sending peer predicts are expected to compose the page. 4. The method as described in claim 2 wherein the sending peer ceases sending information associated with one or more hinting responses when the data from the origin begins to arrive on the sending peer. 5. The method as described in claim 1 wherein the information that annotates a directed cyclic graph identifies a URI-host and path from which one or more blocks of data originate, and a number of times that a page associated with the URI-host and path has led to given other content represented in the annotated directed cyclic graph. 6. The method as described in claim 5 wherein a hinting request generated by the receiving peer predicts the blocks of data the sending peer is expected to utilize when a value of the number reaches a configurable threshold. 7. The method as described in claim 1 wherein the hinting response also predicts what blocks of data are expected to be included in one or more objects that depend from the page. 8. The method as described in claim 7 wherein the one or more objects are one of: embedded page objects, and at least one linked page along with its associated embedded objects. 9. A method operative in an overlay network comprising a sending peer and a receiving peer, wherein the sending and receiving peers provide stream-based data deduplication by examining data that flows through the sending peer and receiving peer and replacing blocks of the data with references that point into data dictionaries associated with each of the peers, the sending peer associated with an origin, and the receiving peer associated with an overlay network edge, the method comprising: maintaining a directed cyclic graph in association with the sending peer; maintaining a directed cyclic graph in association with the receiving peer; wherein each directed cyclic graph represents temporal and ordered relationships among blocks of data that have been seen in the data stream by the respective peer, the directed cyclic graph being annotated with information from which the respective peer can generate a prediction about blocks of data that are subject to the stream-based data deduplication; using the annotated directed cyclic graphs to enforce a compression protocol across the sending and receiving peers wherein, in response to receipt at the receiving peer of a request for a page hosted at the origin, the page and the embedded objects of the page are pre-warmed into the receiving peer and delivered to a requested client in one round trip as measured from the requesting client to the origin; wherein the compression protocol is carried out in software executing in one or more hardware elements. 10. The method as described in claim 9 wherein the compression protocol includes: in response to receipt at the receiving peer of a request for the page hosted at the origin, the receiving peer generating a hinting request that predicts what blocks of data the sending peer is expected to utilize during stream-based data deduplication of the page; upon receipt of the hinting request at the sending peer, the sending peer generating a hinting response that predicts what blocks of data are expected to compose the page; and returning the hinting response to the receiving peer to facilitate a pre-warming operation at the receiving peer during the stream-based data deduplication of the page. 11. The method as described in claim 10 wherein the sending peer returns the hinting response to the receiving peer while going forward to an origin to fetch the page. 12. The method as described in claim 9 wherein the information that annotates a directed cyclic graph identifies a URI-host and path from which one or more blocks of data originate, and a number of times that a page associated with the URI-host and path has led to given other content represented in the annotated directed cyclic graph. 13. The method as described in claim 12 wherein a hinting request generated by the receiving peer predicts the blocks of data the sending peer is expected to utilize when a value of the number reaches a configurable threshold.

Assignees

Inventors

Classifications

  • characterised by resources being split in blocks or fragments · CPC title

  • Pre-fetching or pre-delivering data based on network characteristics · CPC title

  • Electricity · mapped topic

  • Protocols · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9420058B2 cover?
Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chu…
Who is the assignee on this patent?
Akamai Tech Inc
What technology area does this patent fall under?
Primary CPC classification H04L67/5681. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).