What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data compression using nearest neighbor cluster

US11977959B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11977959-B2
Application number	US-201916412970-A
Country	US
Kind code	B2
Filing date	May 15, 2019
Priority date	May 15, 2019
Publication date	May 7, 2024
Grant date	May 7, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are techniques for compressing data in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block can be compressed. In alternate embodiments, nearest neighbor clusters can be formed using unsupervised learning. Additionally, nearest neighbors can also be formed in alternate embodiments using one or more of the following algorithms: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for compressing data in a data storage system comprising: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include an entropy, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 2. The method of claim 1 , further comprising: compressing the XOR map. 3. The method of claim 1 , wherein the locality sensitive hashing function is a secure hash algorithm 1 (“SHA-1”) or a Message Digest 5 (“MD5”) algorithm. 4. The method of claim 1 , wherein the cluster is created in an unsupervised learning environment. 5. The method of claim 1 , wherein the cluster is created using an offload engine. 6. The method of claim 5 , further comprising: identifying a hot block. 7. The method of claim 5 , further comprising: computing a message digest. 8. The method of claim 1 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm. 9. A system for compressing data comprising: a memory comprising computer executable instructions; a processor executing the computer executable instructions, the computer-executable instructions when executed by the processor cause the processor to perform operations comprising: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include a chi square test, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 10. The system of claim 9 , further comprising: compressing the XOR map. 11. The system of claim 9 , wherein the locality sensitive hashing function is a secure hash algorithm 1 (“SHA-1”) or a Message Digest 5 (“MD5”) algorithm. 12. The system of claim 9 , wherein the cluster is created in an unsupervised learning environment. 13. The system of claim 9 , wherein the cluster is created using the offload engine. 14. The system of claim 13 , further comprising: computing a message digest. 15. The system of claim 13 , further comprising: computing an integrity value. 16. The system of claim 9 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm. 17. A non-transitory, computer readable medium comprising code stored thereon that, when executed, performs the following acts: searching a cluster of nearest neighbors to determine if a data block can be compressed, wherein the cluster has been created using a locality sensitive hashing function, the cluster has one or more statistical features that include a Pearson correlation coefficient, and determination of nearest neighbors is made by evaluating a plurality of hash values placed in a coordinate system having at least four dimensions in order to determine a distance between each neighbor; generating an exclusive OR (XOR) map representing how at least two data elements within the cluster differ from one another; using the XOR map with the cluster to maintain a difference between the at least two data elements within the cluster; and using an offload engine to calculate a RAID parity. 18. The non-transitory, computer readable medium of claim 17 , wherein, instead of using locality sensitive hashing to create the cluster of nearest neighbors, one or more of the following algorithms has been used to create the cluster of nearest neighbors: a k-means clustering algorithm, a k-medoids clustering algorithm, a mean shift algorithm, a generalized method of moment (GMM) algorithm, or a density based spatial clustering of applications with noise (DBSCAN) algorithm.

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06N20/00Primary
Machine learning · CPC title
G06F9/30029
Logical and Boolean instructions, e.g. XOR, NOT · CPC title
G06F18/23213
with fixed number of clusters, e.g. K-means clustering · CPC title
G06F18/24147
Distances to closest patterns, e.g. nearest neighbour classification · CPC title
G06N3/088
Non-supervised learning, e.g. competitive learning · CPC title

Patent family

Related publications grouped by family.

View patent family 73228683

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11977959B2 cover?: Disclosed are techniques for compressing data in a data storage system comprising searching a cluster of nearest neighbors, wherein the cluster has been created using a locality sensitive hashing algorithm, to determine if a data block can be compressed. In alternate embodiments, nearest neighbor clusters can be formed using unsupervised learning. Additionally, nearest neighbors can also be for…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Data compression optimization based on client clusters

System and method for comparing plurality of documents

Scalable artificial intelligence driven configuration management

Data deduplication

Distributed deduplication using locality sensitive hashing

Frequently asked questions