Inline wire speed deduplication system

US9401967B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9401967-B2
Application numberUS-79703210-A
CountryUS
Kind codeB2
Filing dateJun 9, 2010
Priority dateJun 9, 2010
Publication dateJul 26, 2016
Grant dateJul 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems for performing inline wire speed data deduplication are described herein. Some embodiments include a device for inline data deduplication that includes one or more input ports for receiving an input data stream containing duplicates, one or more output ports for providing a data deduplicated output data stream, and an inline data deduplication engine coupled to one or more input ports and one or more output ports to process input data containing duplicates into output data which is data deduplicated, where the inline data deduplication engine has an inline data deduplication bandwidth of at least 4 Gigabytes per second.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for inline data deduplication, comprising: one or more input ports for receiving an input data stream containing duplicates; one or more output ports for providing a data deduplicated output data stream; and an inline data deduplication engine coupled to said one or more input ports and said one or more output ports to process input data containing duplicates into output data which is data deduplicated, said inline data deduplication engine having an inline data deduplication bandwidth of at least 4 Gigabytes per second, wherein said inline data deduplication engine comprises: frame memory comprising at least some of the received input data stream and at least some output data provided for inclusion in the output data stream; chunking logic for subdividing input data extracted from the input data stream into input data chunks; chunk identifier logic for generating a chunk identifier for each of the input data chunks based at least in part upon data within the input data chunk, wherein each chunk identifier is uniquely associated with a particular sequence of chunk data; and one or more data compression engines each comprising: a plurality of hash memories each associated with a different lane of a plurality of lanes, and each lane comprising data bytes from at least one of the input data chunks; an array comprising array elements each comprising a plurality of validity bits, wherein each validity bit within an array element corresponds to a different lane of the plurality of lanes; control logic, coupled to the plurality of hash memories and the array, that initiates a read of a hash memory entry if a corresponding validity bit indicates that said entry is valid; and an encoder, coupled to the plurality of hash memories and the control logic, that compresses at least the data bytes for the lane associated with the hash memory comprising the valid entry if said valid entry comprises data that matches the lane data bytes; wherein the one or more data compression engines each operates at least at a rate that is the lower of the bandwidth of an input port of the one or more input ports from which uncompressed data is received and the bandwidth of an output port of the one or more output ports to which compressed data is directed. 2. A device for inline data deduplication, comprising: one or more input ports for receiving an input data stream containing duplicates; one or more output ports for providing a data deduplicated output data stream; and an inline data deduplication engine coupled to said one or more input ports and said one or more output ports to process input data containing duplicates into output data which is data deduplicated, said inline data deduplication engine having an inline data deduplication bandwidth of at least 4 Gigabytes per second, wherein said inline data deduplication engine comprises: frame memory comprising at least some of the received input data stream and at least some output data provided for inclusion in the output data stream; chunking logic for subdividing input data extracted from the input data stream into input data chunks; chunk identifier logic for generating a chunk identifier for each of the input data chunks based at least in part upon data within the input data chunk, wherein each chunk identifier is uniquely associated with a particular sequence of chunk data; Bloom filter logic for identifying as non-matching data chunks at least some input data chunks that do not match any previously processed data chunks already provided as part of the output data stream; Bloom filter array memory for storing Bloom filter status bits; and processing logic for identifying non-matching data chunks not already identified by the Bloom filter, and for controlling the inclusion within the output data stream of the non-matching data chunks identified by the Bloom filter and the processing logic; wherein the identification of non-matching data chunks by the Bloom filter and the processing logic is based at least in part on the chunk identifier. 3. The device of claim 2 , wherein said inline data deduplication engine further comprises a Bloom filter cache memory comprising at least some of the Bloom filter status bits most recently accessed by the Bloom filter logic; and wherein if a first input/output (I/O) operation to access a first Bloom filter status bit stored within the Bloom filter cache memory is followed by a second I/O operation to access the same first Bloom filter status bit or to access a second Bloom filter status bit stored within the Bloom filter cache memory, the second I/O operation will not be held off pending completion of the first I/O operation. 4. A device for inline data deduplication, comprising: one or more input ports for receiving an input data stream containing duplicates; one or more output ports for providing a data deduplicated output data stream; and an inline data deduplication engine coupled to said one or more input ports and said one or more output ports to process input data containing duplicates into output data which is data deduplicated, said inline data deduplication engine having an inline data deduplication bandwidth of at least 4 Gigabytes per second, wherein said inline data deduplication engine comprises: frame memory comprising at least some of the received input data stream and at least some output data provided for inclusion in the output data stream; chunking logic for subdividing input data extracted from the input data stream into input data chunks; chunk identifier logic for generating a chunk identifier for each of the input data chunks based at least in part upon data within the input data chunk, wherein each chunk identifier is uniquely associated with a particular sequence of chunk data; a content addressable storage (CAS) hash index table, at least part of the chunk identifier being used as an index to locate a pointer within the CAS hash index table; and wherein the pointer, if valid, points to groups of one or more CAS entries corresponding to the index, each of the one or more CAS entries comprising a second pointer to a metadata record describing a non-matching data chunk that does not match any previously processed data chunks already provided as part of the output data stream, and further comprising any remaining chunk identifier bits not used as the index. 5. The device of claim 4 , wherein a matching input data chunk is identified if a CAS entry is found that corresponds to an index derived from the chunk identifier of the matching input data chunk, and that includes remaining chunk identifier bits that match the corresponding remaining chunk identifier bits of the matching input data chunk. 6. The device of claim 4 , wherein said inline data deduplication engine further comprises CAS cache memory; and wherein at least some of the one or more CAS entries most recently accessed by said inline data deduplication engine are stored within the CAS cache memory. 7. The device of claim 6 , wherein a collection of adjacent groups of CAS entries are read into the CAS cache memory; and wherein at least some of the CAS entries read into the CAS cache memory describe related non-matching data chunks. 8. The device of claim 4 , wherein said inline data deduplication engine further comprises metadata cache memory; wherein at least some metadata records most recently accessed by said inline data deduplication engine are stored in the metadata cache as part of one or more metadata pages; and wherein at least some metadata records within one of the one or more metadata pages describe related non-matching data chunks. 9. A data deduplication method performed by an inline dedupli

Assignees

Inventors

Classifications

  • Aggregation; Duplicate elimination · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • using compression, e.g. sparse files · CPC title

  • Data stream processing; Continuous queries · CPC title

  • De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9401967B2 cover?
Systems for performing inline wire speed data deduplication are described herein. Some embodiments include a device for inline data deduplication that includes one or more input ports for receiving an input data stream containing duplicates, one or more output ports for providing a data deduplicated output data stream, and an inline data deduplication engine coupled to one or more input ports a…
Who is the assignee on this patent?
Sabaa Amr, Kumar Pashupati, Vu Bao, and 6 more
What technology area does this patent fall under?
Primary CPC classification G06F16/24556. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).