Data compression processing method and apparatus, and computer-readable storage medium

US11797204B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11797204-B2
Application numberUS-202117464904-A
CountryUS
Kind codeB2
Filing dateSep 2, 2021
Priority dateJun 17, 2019
Publication dateOct 24, 2023
Grant dateOct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing method includes obtaining a plurality of data blocks, determining a first data block and a second data block from the data blocks, where the first data block has a first hash value, and the second data block has a second hash value, where the first hash value is obtained by performing calculation on the first data block based on a hash algorithm and the second hash value is obtained by performing calculation on the second data block based on the hash algorithm, and combining and compressing the first data block and the second data block based on a degree of similarity of the first data block and the second data block.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing method comprising: obtaining a plurality of data blocks; determining a first data block and a second data block from the data blocks, wherein the first data block has a first hash value that is based on a first calculation on the first data block using a hash algorithm, and wherein the second data block has a second hash value that is based on a second calculation on the second data block using the hash algorithm; determining that the first data block and the second data block meet a similarity condition based on a degree of similarity of the first hash value and the second hash value; determining whether a data reduction ratio corresponding to a target data block to be obtained by combining and compressing the first data block and the second data block reaches a reduction ratio threshold; when determining that the data reduction ratio reaches the reduction ratio threshold, combining and compressing the first data block and the second data block using a first compression algorithm; and when determining that the data reduction ratio does not reach the reduction ratio threshold, separately compressing the first data block and the second data block using a second compression algorithm different from the first compression algorithm. 2. The data processing method of claim 1 , wherein the hash algorithm is a locality-sensitive hash algorithm. 3. The data processing method of claim 2 , further comprising: segmenting the first data block into a plurality of data sub-blocks of different lengths; calculating a hash value of each of the data sub-blocks; performing combination calculation on hash values of the data sub-blocks to obtain a locality-sensitive hash value corresponding to the first data block; and setting the locality-sensitive hash value as the first hash value. 4. The data processing method of claim 3 , further comprising: identifying that a difference between the first hash value and the second hash value is less than a similarity threshold; and obtaining the degree of similarity based on the identifying. 5. The data processing method of claim 4 , further comprising identifying that a Jaccard distance between the first hash value and the second hash value is less than a first distance threshold. 6. The data processing method of claim 4 , further comprising identifying that a Euclidean distance between the first hash value and the second hash value is less than a second distance threshold. 7. The data processing method of claim 4 , further comprising identifying that a Hamming distance between the first hash value and the second hash value is less than a third distance threshold. 8. The data processing method of claim 1 , after combining and compressing the first data block and the second data block, the method further comprising: adding a first combination compression identifier to first metadata information corresponding to the first data block to indicate that a first compression manner of the first data block is combination compression; and adding a second combination compression identifier to second metadata information corresponding to the second data block to indicate that a second compression manner of the second data block is the combination compression. 9. The data processing method of claim 8 , wherein after combining and compressing the first data block and the second data block, the data processing method further comprises: adding a first location identifier to the first metadata information to indicate a first location of the first data block in the target data block; and adding a second location identifier to the second metadata information to indicate a second location of the second data block in the target data block. 10. The data processing method of claim 1 , after combining and compressing the first data block and the second data block, the method further comprising: determining whether a data length of a combined and compressed target data block exceeds a storage granularity; when determining that the data length of the combined and compressed target data block exceeds the storage granularity, splitting the combined and compressed target data block into several granularities based on a granularity unit and adding a flag to an end of each segment of data to identify consecutive data block address; and when determining that the data length of the combined and compressed target data block is less than the storage granularity, adding 0 to an end of the combined and compressed target data block. 11. A data processing apparatus comprising: a communications interface; and a processor coupled to the communications interface and configured to execute instructions stored in a memory to cause the data processing apparatus to: obtain a plurality of data blocks; determine a first data block and a second data block from the data blocks, wherein the first data block has a first hash value that is based on a first calculation on the first data block based on a hash algorithm, and wherein the second data block has a second hash value that is based on a second calculation on the second data block based on the hash algorithm; determine that the first data block and the second data block meet a similarity condition based on a degree of similarity of the first hash value and the second hash value; determine whether a data reduction ratio corresponding to a target data block to be obtained by combining and compressing the first data block and the second data block reaches a reduction ratio threshold; when determining that the data reduction ratio reaches the reduction ratio threshold, combine and compress the first data block and the second data block using a first compression algorithm; and when determining that the data reduction ratio does not reach the reduction ratio threshold, separately compress the first data block and the second data block using a second compression algorithm different from the first compression algorithm. 12. The data processing apparatus of claim 11 , wherein the hash algorithm is a locality-sensitive hash algorithm. 13. The data processing apparatus of claim 12 , wherein the processor further causes the data processing apparatus to: segment the first data block into a plurality of data sub-blocks of different lengths; calculate a hash value of each of the data sub-blocks; perform combination calculation on hash values of the data sub-blocks to obtain a locality-sensitive hash value corresponding to the first data block; and set the locality-sensitive hash value as the first hash value. 14. The data processing apparatus of claim 13 , wherein the processor further causes the data processing apparatus to: identify that a Jaccard distance between the first hash value and the second hash value is less than a first distance threshold; identify that a Euclidean distance between the first hash value and the second hash value is less than a second distance threshold; or identify that a Hamming distance between the first hash value and the second hash value is less than a third distance threshold. 15. The data processing apparatus of claim 11 , wherein the processor further causes the data processing apparatus to: identify that a difference between the first hash value and the second hash value is less than a similarity threshold; and obtain the degree of similarity based on the difference between the first hash value and the second hash value. 16. The data processing apparatus of claim 11 , wherein after combining and compressing the first data block and the second data block, the pro

Assignees

Inventors

Classifications

  • G06F3/064Primary

    Management of blocks · CPC title

  • G06F3/0608Primary

    Saving storage space on storage systems · CPC title

  • Single storage device · CPC title

  • Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title

  • Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11797204B2 cover?
A data processing method includes obtaining a plurality of data blocks, determining a first data block and a second data block from the data blocks, where the first data block has a first hash value, and the second data block has a second hash value, where the first hash value is obtained by performing calculation on the first data block based on a hash algorithm and the second hash value is ob…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/064. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).