Device and method for encoding input data based on hamming distance and/or weight
US-9369486-B2 · Jun 14, 2016 · US
US9935652B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9935652-B1 |
| Application number | US-201715722073-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 2, 2017 |
| Priority date | Oct 2, 2017 |
| Publication date | Apr 3, 2018 |
| Grant date | Apr 3, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Data is compressed based on non-identical similarity between a first data set and a second data set. A representation of the differences is used to represent one of the data sets. For example, a probabilistically unique value may be generated as a new block label. Probabilistic comparison of the new block label with a plurality of training labels associated with training blocks produces a plurality of training labels that are potentially similar to the new block label. The Hamming distance between each potentially similar training label and the new block label is determined to select the training label with the smallest calculated Hamming distance from the new block label. A bitmap of differences between the new block and the training block associated with the selected training label is compressed and stored as a compressed representation of the new block.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: physical storage comprising a plurality of persistent storage devices; and at least one computing node comprising at least one processor and memory, the processor running a compression algorithm that compresses a new block by: generating a probabilistically unique value that is used as a new block label; performing a probabilistic comparison of the new block label with a plurality of training labels, each training label being uniquely associated with a different training block, thereby identifiying a plurality of training labels that are potentially similar to the new block label; calculating a Hamming distance between each potentially similar training label and the new block label; selecting the training label associated with the smallest calculated Hamming distance; generating a bitmap of differences between the new block and the training block associated with the selected training label; storing the bitmap as a compressed representation of the new block; and discarding the new block. 2. The apparatus of claim 1 comprising a hash function that generates the probabilistically unique value that is used as the new block label. 3. The apparatus of claim 1 comprising a Bloom filter that performs the probabilistic comparison of the new block label with the plurality of training labels. 4. The apparatus of claim 1 comprising an XOR function that generates the bitmap of differences between the new block and the training block associated with the selected training label. 5. The apparatus of claim 4 wherein the compression algorithm compresses the XOR bitmap with Run Length Limited encoding. 6. The apparatus of claim 5 wherein the compressed XOR bitmap is stored in the physical storage. 7. The apparatus of claim 6 wherein a copy of the selected training label is associated with the compressed XOR bitmap and stored in the physical storage. 8. The apparatus of claim 1 wherein the training labels are part of a pre-trained discrimination network. 9. The apparatus of claim 8 wherein the discrimination network is retrained based on data stored in the physical storage. 10. The apparatus of claim 7 wherein the new block is recovered by using the stored training label to locate the training block, decompressing the XOR bitmap, and XORing the training block with the XOR bitmap. 11. A method comprising: in a storage system comprising physical storage and at least one computing node comprising at least one processor and memory, compressing a new block by: generating a probabilistically unique value that is used as a new block label; performing a probabilistic comparison of the new block label with a plurality of training labels, each training label being uniquely associated with a different training block, thereby identifiying a plurality of training labels that are potentially similar to the new block label; calculating a Hamming distance between each potentially similar training label and the new block label; selecting the training label associated with the smallest calculated Hamming distance; generating a bitmap of differences between the new block and the training block associated with the selected training label; storing the bitmap as a compressed representation of the new block; and discarding the new block. 12. The method of claim 11 comprising hashing the new block to generate the probabilistically unique value that is used as the new block label. 13. The method of claim 11 using a Bloom filter to perform the probabilistic comparison of the new block label with the plurality of training labels. 14. The method of claim 11 comprising using an XOR function to generate the bitmap of differences between the new block and the training block associated with the selected training label. 15. The method of claim 14 comprising compressing the XOR bitmap with Run Length Limited encoding. 16. The method of claim 15 comprising storing the compressed XOR bitmap in the physical storage. 17. The method of claim 16 comprising associating a copy of the selected training label with the compressed XOR bitmap in the physical storage. 18. The method of claim 11 comprising pre-training the discrimination network. 19. The method of claim 18 comprising retraining the discrimination network based on data stored in the physical storage. 20. The method of claim 17 comprising recovering the new block by using the stored training label to locate the training block, decompressing the XOR bitmap, and XORing the training block with the XOR bitmap.
Physics · mapped topic
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
using adaptive string matching, e.g. the Lempel-Ziv method · CPC title
Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title
Context modeling · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.