Synchronized data deduplication
US-8930306-B1 · Jan 6, 2015 · US
US2016162218A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016162218-A1 |
| Application number | US-201414559495-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 3, 2014 |
| Priority date | Dec 3, 2014 |
| Publication date | Jun 9, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Distributed data deduplication may include or utilize containers attached to nodes or byte caches in a cluster or enterprise networks. The containers may store a mapping of byte caches and hashes the byte caches hold. An encoding byte cache may communicate with its attached container to determine which nodes should send which hash values, and may encode an output stream accordingly. Decoding byte cache decompresses the output stream by communicating with its attached container for receiving hash values and associated content from one or more byte caches specified in the output stream.
Opening claim text (preview).
We claim: 1 . A method of providing distributed data deduplication in enterprise network, comprising: receiving a byte stream by a controller of a byte cache, the byte cache being one of a plurality of byte caches in the enterprise network; encoding the byte stream by the controller by generating one or more hash values associated with one or more regions of the byte stream; storing the one or more hash values and associated one or more regions in a storage of the byte cache if the one or more hash values and associated one or more regions do not exist in the storage of the byte cache; querying a container logic associated with the byte cache to determine which of the one or more hash values to send; responsive to a response from the container indicating that the one or more hash values do not exist in other byte caches in the enterprise network, attaching all of the one or more hash values and the associated one or more regions to an output stream; responsive to a response from the container including a hash value and byte cache identifier pair indicating that the hash value exists in a byte cache identified by the byte cache identifier, attaching the hash value and byte cache identifier pair received from the container in the output stream along with non-redundant data of the byte stream and said one or more hash values not identified in the response from the container; creating a transmission control protocol connection to a receiving byte cache in the enterprise network; and transmitting the output stream to the receiving byte cache. 2 . The method of claim 1 , wherein to respond to the querying from the byte cache, the container logic searches a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content. 3 . The method of claim 2 , wherein responsive to finding more than one byte cache storing one or more of the hash values, utilizing a weighing algorithm to select a hash value to byte cache identifier pair to send to the byte cache. 4 . The method of claim 1 , further comprising: decoding the output stream received at the receiving byte cache by decompressing the output stream into a message using the hash values included in the output stream; sending the decompressed message to a destination; updating the map to include the receiving byte cache and the hash values mapping; and broadcasting that the receiving byte cache stores the hash values included in the output stream. 5 . The method of claim 4 , wherein the decoding further comprises counting a number of hits for the hash values included in the output stream. 6 . The method of claim 5 , further comprising updating a timer associated with the hash values in the output stream that hit in the receiving byte cache, the timer used for replacement strategy. 7 . The method of claim 4 , wherein responsive to receiving the output stream that contains the hash value and byte cache identifier pair, requesting from the byte cache identified by the byte cache identifier, the hash value and associated represented content. 8 . A computer readable storage medium storing a program of instructions executable by a machine to perform a method of providing distributed data deduplication in enterprise network, the method comprising: receiving a byte stream by a controller of a byte cache, the byte cache being one of a plurality of byte caches in the enterprise network; encoding the byte stream by the controller by generating one or more hash values associated with one or more regions of the byte stream; storing the one or more hash values and associated one or more regions in a storage of the byte cache if the one or more hash values and associated one or more regions do not exist in the storage of the byte cache; querying a container logic associated with the byte cache to determine which of the one or more hash values to send; responsive to a response from the container indicating that the one or more hash values do not exist in other byte caches in the enterprise network, attaching all of the one or more hash values and the associated one or more regions to an output stream; responsive to a response from the container including a hash value and byte cache identifier pair indicating that the hash value exists in a byte cache identified by the byte cache identifier, attaching the hash value and byte cache identifier pair received from the container in the output stream along with non-redundant data of the byte stream and said one or more hash values not identified in the response from the container; creating a transmission control protocol connection to a receiving byte cache in the enterprise network; and transmitting the output stream to the receiving byte cache. 9 . The computer readable storage medium of claim 8 , wherein to respond to the querying from the byte cache, the container logic searches a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content. 10 . The computer readable storage medium of claim 9 , wherein responsive to finding more than one byte cache storing one or more of the hash values, utilizing a weighing algorithm to select a hash value to byte cache identifier pair to send to the byte cache. 11 . The computer readable storage medium of claim 8 , further comprising: decoding the output stream received at the receiving byte cache by decompressing the output stream into a message using the hash values included in the output stream; sending the decompressed message to a destination; updating the map to include the receiving byte cache and the hash values mapping; and broadcasting that the receiving byte cache stores the hash values included in the output stream. 12 . The computer readable storage medium of claim 11 , wherein the decoding further comprises counting a number of hits for the hash values included in the output stream. 13 . The computer readable storage medium of claim 12 , further comprising updating a timer associated with the hash values in the output stream that hit in the receiving byte cache, the timer used for replacement strategy. 14 . The computer readable storage medium of claim 11 , wherein responsive to receiving the output stream that contains the hash value and byte cache identifier pair, requesting from the byte cache identified by the byte cache identifier, the hash value and associated represented content. 15 . A system of providing distributed data deduplication in enterprise network, comprising: a byte cache comprising a controller logic and memory, the byte cache being one of a plurality of byte caches in the enterprise network, the controller logic of the byte cache operable to receive a byte stream and encode the byte stream by generating one or more hash values associated with one or more regions of the byte stream, the controller logic of the byte cache further operable to store the one or more hash values and associated one or more regions in the memory if the one or more hash values and associated one or more regions do not exist in the memory; and a container connected to the byte cache, the container comprising container logic and container memory, the container memory operable to store a map containing hash value to byte cache identifier mappings indicating which byte caches of the enterprise network store which hash values and associated content, the container operable to receive a query from the byte cache controller requesting which of the one or more
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
in relation to data integrity, e.g. data losses, bit errors · CPC title
De-duplication techniques · CPC title
Physics · mapped topic
Saving storage space on storage systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.