Distributed data deduplication in enterprise networks

US2016162218A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016162218-A1
Application numberUS-201414559495-A
CountryUS
Kind codeA1
Filing dateDec 3, 2014
Priority dateDec 3, 2014
Publication dateJun 9, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Distributed data deduplication may include or utilize containers attached to nodes or byte caches in a cluster or enterprise networks. The containers may store a mapping of byte caches and hashes the byte caches hold. An encoding byte cache may communicate with its attached container to determine which nodes should send which hash values, and may encode an output stream accordingly. Decoding byte cache decompresses the output stream by communicating with its attached container for receiving hash values and associated content from one or more byte caches specified in the output stream.

First claim

Opening claim text (preview).

We claim: 1 . A method of providing distributed data deduplication in enterprise network, comprising: receiving a byte stream by a controller of a byte cache, the byte cache being one of a plurality of byte caches in the enterprise network; encoding the byte stream by the controller by generating one or more hash values associated with one or more regions of the byte stream; storing the one or more hash values and associated one or more regions in a storage of the byte cache if the one or more hash values and associated one or more regions do not exist in the storage of the byte cache; querying a container logic associated with the byte cache to determine which of the one or more hash values to send; responsive to a response from the container indicating that the one or more hash values do not exist in other byte caches in the enterprise network, attaching all of the one or more hash values and the associated one or more regions to an output stream; responsive to a response from the container including a hash value and byte cache identifier pair indicating that the hash value exists in a byte cache identified by the byte cache identifier, attaching the hash value and byte cache identifier pair received from the container in the output stream along with non-redundant data of the byte stream and said one or more hash values not identified in the response from the container; creating a transmission control protocol connection to a receiving byte cache in the enterprise network; and transmitting the output stream to the receiving byte cache. 2 . The method of claim 1 , wherein to respond to the querying from the byte cache, the container logic searches a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content. 3 . The method of claim 2 , wherein responsive to finding more than one byte cache storing one or more of the hash values, utilizing a weighing algorithm to select a hash value to byte cache identifier pair to send to the byte cache. 4 . The method of claim 1 , further comprising: decoding the output stream received at the receiving byte cache by decompressing the output stream into a message using the hash values included in the output stream; sending the decompressed message to a destination; updating the map to include the receiving byte cache and the hash values mapping; and broadcasting that the receiving byte cache stores the hash values included in the output stream. 5 . The method of claim 4 , wherein the decoding further comprises counting a number of hits for the hash values included in the output stream. 6 . The method of claim 5 , further comprising updating a timer associated with the hash values in the output stream that hit in the receiving byte cache, the timer used for replacement strategy. 7 . The method of claim 4 , wherein responsive to receiving the output stream that contains the hash value and byte cache identifier pair, requesting from the byte cache identified by the byte cache identifier, the hash value and associated represented content. 8 . A computer readable storage medium storing a program of instructions executable by a machine to perform a method of providing distributed data deduplication in enterprise network, the method comprising: receiving a byte stream by a controller of a byte cache, the byte cache being one of a plurality of byte caches in the enterprise network; encoding the byte stream by the controller by generating one or more hash values associated with one or more regions of the byte stream; storing the one or more hash values and associated one or more regions in a storage of the byte cache if the one or more hash values and associated one or more regions do not exist in the storage of the byte cache; querying a container logic associated with the byte cache to determine which of the one or more hash values to send; responsive to a response from the container indicating that the one or more hash values do not exist in other byte caches in the enterprise network, attaching all of the one or more hash values and the associated one or more regions to an output stream; responsive to a response from the container including a hash value and byte cache identifier pair indicating that the hash value exists in a byte cache identified by the byte cache identifier, attaching the hash value and byte cache identifier pair received from the container in the output stream along with non-redundant data of the byte stream and said one or more hash values not identified in the response from the container; creating a transmission control protocol connection to a receiving byte cache in the enterprise network; and transmitting the output stream to the receiving byte cache. 9 . The computer readable storage medium of claim 8 , wherein to respond to the querying from the byte cache, the container logic searches a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content. 10 . The computer readable storage medium of claim 9 , wherein responsive to finding more than one byte cache storing one or more of the hash values, utilizing a weighing algorithm to select a hash value to byte cache identifier pair to send to the byte cache. 11 . The computer readable storage medium of claim 8 , further comprising: decoding the output stream received at the receiving byte cache by decompressing the output stream into a message using the hash values included in the output stream; sending the decompressed message to a destination; updating the map to include the receiving byte cache and the hash values mapping; and broadcasting that the receiving byte cache stores the hash values included in the output stream. 12 . The computer readable storage medium of claim 11 , wherein the decoding further comprises counting a number of hits for the hash values included in the output stream. 13 . The computer readable storage medium of claim 12 , further comprising updating a timer associated with the hash values in the output stream that hit in the receiving byte cache, the timer used for replacement strategy. 14 . The computer readable storage medium of claim 11 , wherein responsive to receiving the output stream that contains the hash value and byte cache identifier pair, requesting from the byte cache identified by the byte cache identifier, the hash value and associated represented content. 15 . A system of providing distributed data deduplication in enterprise network, comprising: a byte cache comprising a controller logic and memory, the byte cache being one of a plurality of byte caches in the enterprise network, the controller logic of the byte cache operable to receive a byte stream and encode the byte stream by generating one or more hash values associated with one or more regions of the byte stream, the controller logic of the byte cache further operable to store the one or more hash values and associated one or more regions in the memory if the one or more hash values and associated one or more regions do not exist in the memory; and a container connected to the byte cache, the container comprising container logic and container memory, the container memory operable to store a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content, the container operable to receive a query from the byte cache controller requesting which of the one or more

Assignees

Inventors

Classifications

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • in relation to data integrity, e.g. data losses, bit errors · CPC title

  • G06F3/0641Primary

    De-duplication techniques · CPC title

  • Physics · mapped topic

  • Saving storage space on storage systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016162218A1 cover?
Distributed data deduplication may include or utilize containers attached to nodes or byte caches in a cluster or enterprise networks. The containers may store a mapping of byte caches and hashes the byte caches hold. An encoding byte cache may communicate with its attached container to determine which nodes should send which hash values, and may encode an output stream accordingly. Decoding by…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F3/0641. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 09 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).