Data compression optimization based on client clusters
US-10694002-B1 · Jun 23, 2020 · US
US11977959B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11977959-B2 |
| Application number | US-201916412970-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2019 |
| Priority date | May 15, 2019 |
| Publication date | May 7, 2024 |
| Grant date | May 7, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are techniques for compressing data in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block can be compressed. In alternate embodiments, nearest neighbor clusters can be formed using unsupervised learning. Additionally, nearest neighbors can also be formed in alternate embodiments using one or more of the following algorithms: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.
Opening claim text (preview).
What is claimed is: 1. A method for compressing data in a data storage system comprising: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include an entropy, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 2. The method of claim 1 , further comprising: compressing the XOR map. 3. The method of claim 1 , wherein the locality sensitive hashing function is a secure hash algorithm 1 (“SHA-1”) or a Message Digest 5 (“MD5”) algorithm. 4. The method of claim 1 , wherein the cluster is created in an unsupervised learning environment. 5. The method of claim 1 , wherein the cluster is created using an offload engine. 6. The method of claim 5 , further comprising: identifying a hot block. 7. The method of claim 5 , further comprising: computing a message digest. 8. The method of claim 1 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm. 9. A system for compressing data comprising: a memory comprising computer executable instructions; a processor executing the computer executable instructions, the computer-executable instructions when executed by the processor cause the processor to perform operations comprising: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include a chi square test, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 10. The system of claim 9 , further comprising: compressing the XOR map. 11. The system of claim 9 , wherein the locality sensitive hashing function is a secure hash algorithm 1 (“SHA-1”) or a Message Digest 5 (“MD5”) algorithm. 12. The system of claim 9 , wherein the cluster is created in an unsupervised learning environment. 13. The system of claim 9 , wherein the cluster is created using the offload engine. 14. The system of claim 13 , further comprising: computing a message digest. 15. The system of claim 13 , further comprising: computing an integrity value. 16. The system of claim 9 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm. 17. A non-transitory, computer readable medium comprising code stored thereon that, when executed, performs the following acts: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include a Pearson correlation coefficient, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 18. The non-transitory, computer readable medium of claim 17 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.
Machine learning · CPC title
Logical and Boolean instructions, e.g. XOR, NOT · CPC title
with fixed number of clusters, e.g. K-means clustering · CPC title
Distances to closest patterns, e.g. nearest neighbour classification · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.