Data de-duplication for information storage systems
US-8954399-B1 · Feb 10, 2015 · US
US11178201B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11178201-B2 |
| Application number | US-201916565754-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 10, 2019 |
| Priority date | Dec 27, 2012 |
| Publication date | Nov 16, 2021 |
| Grant date | Nov 16, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Stream-based data deduplication is provided in a multi-tenant shared infrastructure but without requiring “paired” endpoints having synchronized data dictionaries. Data objects processed by the dedupe functionality are treated as objects that can be fetched as needed. As such, a decoding peer does not need to maintain a symmetric library for the origin. Rather, if the peer does not have the chunks in cache that it needs, it follows a conventional content delivery network procedure to retrieve them. In this way, if dictionaries between pairs of sending and receiving peers are out-of-sync, relevant sections are then re-synchronized on-demand. The approach does not require that libraries maintained at a particular pair of sender and receiving peers are the same. Rather, the technique enables a peer, in effect, to “backfill” its dictionary on-the-fly. On-the-wire compression techniques are provided to reduce the amount of data transmitted between the peers.
Opening claim text (preview).
What is claimed is as follows: 1. A data deduplication system, the system comprising: a sending peer entity maintaining a first data dictionary, and a receiving peer entity maintaining a second data dictionary, the sending and receiving peer entities comprising processes, wherein each of the sending and receiving peer entities further comprises stream-based data deduplication software executed by a hardware processor that is configured to examine a data stream that flows through the entity and to replace blocks of data with references that point into the entity's associated data dictionary; the sending peer entity further including: a directed cyclic graph representing temporal and ordered relationships among blocks of data that have been seen in the data stream by the sending peer entity, the directed cyclic graph comprising one or more nodes, wherein a node represents a block of data and has associated therewith a label denoting a fingerprint associated with the block of data; and an encoder that uses information in the directed cyclic graph to replace one or more references to blocks of data that have been seen in the data stream by the sending peer entity with a compact data representation; wherein the compact data representation is associated with a stretch of nodes with degree out of one that are connected together in the directed cyclic graph, the compact data representation defined by a tuple: {a fingerprint of a starting node in the stretch of nodes, a number of nodes in the stretch of nodes, and a hash of the nodes below the starting fingerprint}. 2. The system as described in claim 1 wherein the encoder also includes a deflation algorithm that removes one or more looped occurrences of the stretch of nodes. 3. The system as described in claim 1 wherein the directed cyclic graph also includes at least one overflow node with degree out greater than one. 4. The system as described in claim 1 wherein a given node in the directed cyclic graph has a state selected from a set of states. 5. The system as described in claim 1 wherein the data is a file that exhibits low entropy as measured by an extent to which the file changes from one version to a next version. 6. The system as described in claim 1 further including applying a coding scheme to the reduce a size of a token of the directed cyclic graph. 7. The system as described in claim 6 wherein the coding scheme is a Huffman coding. 8. The system as described in claim 1 wherein the sending peer entity is associated with an edge server in an overlay network. 9. The system as described in claim 1 wherein the sending and receiving peer entities are co-processes located on peer computing nodes. 10. The system as described in claim 1 wherein the fingerprint is a binary compressed representation of the block of data.
Pre-fetching or pre-delivering data based on network characteristics · CPC title
Reducing the amount or size of exchanged application data · CPC title
Network streaming of media packets · CPC title
Pairs of inter-processing entities at each side of the network, e.g. split proxies · CPC title
specially adapted for file transfer, e.g. file transfer protocol [FTP] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.