Data Deduplication Using Multi-Chunk Predictive Encoding
US-2018196609-A1 · Jul 12, 2018 · US
US12499249B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499249-B2 |
| Application number | US-202418808863-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 19, 2024 |
| Priority date | Apr 24, 2018 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Transitioning leadership in a cluster of nodes, including: initiating, by two or more nodes among a cluster of nodes, a leadership transition, wherein: a first node transmits a first secret key identifier to each of the other nodes in the cluster of nodes; and a second node transmits a second secret key identifier to each of the other nodes in the cluster of nodes; updating, by each node and based at least in part on a resolution policy, the current secret key identifier to be the second secret key identifier instead of the first secret key identifier; and transitioning, based at least in part on the second secret key identifier being selected to be the current secret key identifier, the second node to be a leader node of the cluster of nodes.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: initiating a leadership transition to a new leader node among a cluster of nodes, wherein one or more nodes of the cluster of nodes transmit secret key identifiers to other nodes in the cluster of nodes; selecting, based at least in part on a resolution policy, a secret key identifier associated with a first node, of the one or more nodes, that is distinct from a current leader node; and transitioning, based at least in part on the selected secret key identifier being selected to be a current secret key identifier, the first node to be the new leader node of the cluster of nodes. 2 . The method of claim 1 , wherein the leadership transition is initiated by at least one node in the cluster of nodes, wherein the cluster of nodes are included in a deduplication cluster that comprises multiple servers within an intermediate computing system between one or more client devices and a backend cloud storage service. 3 . The method of claim 2 , wherein a data store on the intermediate computing system uses a consistent data storage model, and wherein the data store for the backend cloud storage service uses an eventually consistent data storage model. 4 . The method of claim 1 , wherein the method further comprises: receiving, via an application program interface of a front-end process, a stream of data; splitting the stream of data into blocks of data; hashing the blocks of data; determining, whether a hash value for a block of data from among the blocks of data, is a duplicate of a hash value for a stored block of data; responsive to the hash value for the block of data not matching the hash value for the stored block of data, routing the block of data to a process from among the cluster of nodes; and repeating, for each given hash for a given block of data of the blocks of data: determining whether the given hash is a duplicate of some hash value for some stored block of data, and responsive to the given hash value for the given block of data not matching some hash value for some stored block of data, routing the given block of data to a process from among the cluster of nodes. 5 . The method of claim 4 , further comprising: distributing, to different ones of the cluster of nodes, hashed blocks of data that are not duplicates of stored data. 6 . The method of claim 5 , further comprising: sending, from a process that has received a hashed block of data to a remote data store, one or more portions of the block of data, wherein the one or more portions of the block of data correspond to one or more transactions. 7 . The method of claim 6 , wherein the one or more transactions are recorded within a transaction log for the process, and wherein each process among the cluster of nodes generates a transaction log corresponding to data sent to the remote data store. 8 . The method of claim 7 , wherein the front-end process receives the stream of bytes of data from a client device via a communication interface that is compatible with a communication interface provided by the remote data store. 9 . The method of claim 7 , wherein the remote data store is an object store provided by a cloud services provider. 10 . The method of claim 1 , wherein at least one process among the cluster of nodes operates in parallel with at least one other process among the cluster of nodes. 11 . A system comprising: a memory; and a processing device operably coupled to the memory, the processing device configured to: initiate a leadership transition to a new leader node among a cluster of nodes, wherein one or more nodes of the cluster of nodes transmit secret key identifiers to other nodes in the cluster of nodes; select, based at least in part on a resolution policy, a secret key identifier associated with a first node, of the one or more nodes, that is distinct from a current leader node; and transition, based at least in part on the selected secret key identifier being selected to be a current secret key identifier, the first node to be the new leader node of the cluster of nodes. 12 . The system of claim 11 , wherein the system comprises multiple servers within an intermediate computing system between one or more client devices and a backend cloud storage service. 13 . The system of claim 12 , wherein a data store on the intermediate computing system uses a consistent data storage model, and wherein the data store for the backend cloud storage service uses an eventually consistent data storage model. 14 . The system of claim 11 , wherein the processing device is further configured to: receive, via an application program interface of a front-end process, a stream of data; split the stream of data into blocks of data; hashing the blocks of data; determine, whether a hash value for a block of data from among the blocks of data, is a duplicate of a hash value for a stored block of data; responsive to the hash value for the block of data not matching the hash value for the stored block of data, route the block of data to a process from among the cluster of nodes; and repeat, for each given hash for a given block of data of the blocks of data: determine whether the given hash is a duplicate of some hash value for some stored block of data, and responsive to the given hash value for the given block of data not matching some hash value for some stored block of data, route the given block of data to a process from among the cluster of nodes. 15 . The system of claim 14 , wherein the processing device is further configured to: distribute, to different ones of the cluster of nodes, hashed blocks of data that are not duplicates of stored data. 16 . The system of claim 15 , wherein the processing device is further configured to: send, from a process that has received a hashed block of data to a remote data store, one or more portions of the block of data, wherein the one or more portions of the block of data correspond to one or more transactions. 17 . The system of claim 16 , wherein the one or more transactions are recorded within a transaction log for the process, and wherein each process among the cluster of nodes generates a transaction log corresponding to data sent to the remote data store. 18 . The system of claim 17 , wherein the front-end process receives the stream of bytes of data from a client device via a communication interface that is compatible with a communication interface provided by the remote data store. 19 . The system of claim 17 , wherein the remote data store is an object store provided by a cloud services provider. 20 . A non-transitory computer readable medium storing instructions that, when executed, cause a processing device to: initiate a leadership transition to a new leader node among a cluster of nodes, wherein one or more nodes of the cluster of nodes transmit secret key identifiers to other nodes in the cluster of nodes; select, based at least in part on a resolution policy, a secret key identifier associated with a first node, of the one or more nodes, that is distinct from a current leader node; and transition, based at least in part on the selected secret key identifier being selected to be a current secret key identifier, the first node to be the new leader node of the cluster of nodes.
Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title
Ensuring data consistency and integrity · CPC title
Updates performed during online database operations; commit processing · CPC title
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Revocation or update of secret information, e.g. encryption key update or rekeying · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.