What technology area does this patent fall under?

Primary CPC classification H03M7/3084. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Apr 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Data compression by hamming distance categorization

US9935652B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9935652-B1
Application number	US-201715722073-A
Country	US
Kind code	B1
Filing date	Oct 2, 2017
Priority date	Oct 2, 2017
Publication date	Apr 3, 2018
Grant date	Apr 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data is compressed based on non-identical similarity between a first data set and a second data set. A representation of the differences is used to represent one of the data sets. For example, a probabilistically unique value may be generated as a new block label. Probabilistic comparison of the new block label with a plurality of training labels associated with training blocks produces a plurality of training labels that are potentially similar to the new block label. The Hamming distance between each potentially similar training label and the new block label is determined to select the training label with the smallest calculated Hamming distance from the new block label. A bitmap of differences between the new block and the training block associated with the selected training label is compressed and stored as a compressed representation of the new block.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: physical storage comprising a plurality of persistent storage devices; and at least one computing node comprising at least one processor and memory, the processor running a compression algorithm that compresses a new block by: generating a probabilistically unique value that is used as a new block label; performing a probabilistic comparison of the new block label with a plurality of training labels, each training label being uniquely associated with a different training block, thereby identifiying a plurality of training labels that are potentially similar to the new block label; calculating a Hamming distance between each potentially similar training label and the new block label; selecting the training label associated with the smallest calculated Hamming distance; generating a bitmap of differences between the new block and the training block associated with the selected training label; storing the bitmap as a compressed representation of the new block; and discarding the new block. 2. The apparatus of claim 1 comprising a hash function that generates the probabilistically unique value that is used as the new block label. 3. The apparatus of claim 1 comprising a Bloom filter that performs the probabilistic comparison of the new block label with the plurality of training labels. 4. The apparatus of claim 1 comprising an XOR function that generates the bitmap of differences between the new block and the training block associated with the selected training label. 5. The apparatus of claim 4 wherein the compression algorithm compresses the XOR bitmap with Run Length Limited encoding. 6. The apparatus of claim 5 wherein the compressed XOR bitmap is stored in the physical storage. 7. The apparatus of claim 6 wherein a copy of the selected training label is associated with the compressed XOR bitmap and stored in the physical storage. 8. The apparatus of claim 1 wherein the training labels are part of a pre-trained discrimination network. 9. The apparatus of claim 8 wherein the discrimination network is retrained based on data stored in the physical storage. 10. The apparatus of claim 7 wherein the new block is recovered by using the stored training label to locate the training block, decompressing the XOR bitmap, and XORing the training block with the XOR bitmap. 11. A method comprising: in a storage system comprising physical storage and at least one computing node comprising at least one processor and memory, compressing a new block by: generating a probabilistically unique value that is used as a new block label; performing a probabilistic comparison of the new block label with a plurality of training labels, each training label being uniquely associated with a different training block, thereby identifiying a plurality of training labels that are potentially similar to the new block label; calculating a Hamming distance between each potentially similar training label and the new block label; selecting the training label associated with the smallest calculated Hamming distance; generating a bitmap of differences between the new block and the training block associated with the selected training label; storing the bitmap as a compressed representation of the new block; and discarding the new block. 12. The method of claim 11 comprising hashing the new block to generate the probabilistically unique value that is used as the new block label. 13. The method of claim 11 using a Bloom filter to perform the probabilistic comparison of the new block label with the plurality of training labels. 14. The method of claim 11 comprising using an XOR function to generate the bitmap of differences between the new block and the training block associated with the selected training label. 15. The method of claim 14 comprising compressing the XOR bitmap with Run Length Limited encoding. 16. The method of claim 15 comprising storing the compressed XOR bitmap in the physical storage. 17. The method of claim 16 comprising associating a copy of the selected training label with the compressed XOR bitmap in the physical storage. 18. The method of claim 11 comprising pre-training the discrimination network. 19. The method of claim 18 comprising retraining the discrimination network based on data stored in the physical storage. 20. The method of claim 17 comprising recovering the new block by using the stored training label to locate the training block, decompressing the XOR bitmap, and XORing the training block with the XOR bitmap.

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06F17/301
Physics · mapped topic
G06F17/18
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
H03M7/3084Primary
using adaptive string matching, e.g. the Lempel-Ziv method · CPC title
H03M7/46Primary
Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title
H03M7/3079
Context modeling · CPC title

Patent family

Related publications grouped by family.

View patent family 61711551

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9935652B1 cover?: Data is compressed based on non-identical similarity between a first data set and a second data set. A representation of the differences is used to represent one of the data sets. For example, a probabilistically unique value may be generated as a new block label. Probabilistic comparison of the new block label with a plurality of training labels associated with training blocks produces a plura…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification H03M7/3084. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Apr 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).