What technology area does this patent fall under?

Primary CPC classification G06F3/064. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Data compression processing method and apparatus, and computer-readable storage medium

US11797204B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11797204-B2
Application number	US-202117464904-A
Country	US
Kind code	B2
Filing date	Sep 2, 2021
Priority date	Jun 17, 2019
Publication date	Oct 24, 2023
Grant date	Oct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing method includes obtaining a plurality of data blocks, determining a first data block and a second data block from the data blocks, where the first data block has a first hash value, and the second data block has a second hash value, where the first hash value is obtained by performing calculation on the first data block based on a hash algorithm and the second hash value is obtained by performing calculation on the second data block based on the hash algorithm, and combining and compressing the first data block and the second data block based on a degree of similarity of the first data block and the second data block.

First claim

Opening claim text (preview).

What is claimed is: 1. A data processing method comprising: obtaining a plurality of data blocks; determining a first data block and a second data block from the data blocks, wherein the first data block has a first hash value that is based on a first calculation on the first data block using a hash algorithm, and wherein the second data block has a second hash value that is based on a second calculation on the second data block using the hash algorithm; determining that the first data block and the second data block meet a similarity condition based on a degree of similarity of the first hash value and the second hash value; determining whether a data reduction ratio corresponding to a target data block to be obtained by combining and compressing the first data block and the second data block reaches a reduction ratio threshold; when determining that the data reduction ratio reaches the reduction ratio threshold, combining and compressing the first data block and the second data block using a first compression algorithm; and when determining that the data reduction ratio does not reach the reduction ratio threshold, separately compressing the first data block and the second data block using a second compression algorithm different from the first compression algorithm. 2. The data processing method of claim 1 , wherein the hash algorithm is a locality-sensitive hash algorithm. 3. The data processing method of claim 2 , further comprising: segmenting the first data block into a plurality of data sub-blocks of different lengths; calculating a hash value of each of the data sub-blocks; performing combination calculation on hash values of the data sub-blocks to obtain a locality-sensitive hash value corresponding to the first data block; and setting the locality-sensitive hash value as the first hash value. 4. The data processing method of claim 3 , further comprising: identifying that a difference between the first hash value and the second hash value is less than a similarity threshold; and obtaining the degree of similarity based on the identifying. 5. The data processing method of claim 4 , further comprising identifying that a Jaccard distance between the first hash value and the second hash value is less than a first distance threshold. 6. The data processing method of claim 4 , further comprising identifying that a Euclidean distance between the first hash value and the second hash value is less than a second distance threshold. 7. The data processing method of claim 4 , further comprising identifying that a Hamming distance between the first hash value and the second hash value is less than a third distance threshold. 8. The data processing method of claim 1 , after combining and compressing the first data block and the second data block, the method further comprising: adding a first combination compression identifier to first metadata information corresponding to the first data block to indicate that a first compression manner of the first data block is combination compression; and adding a second combination compression identifier to second metadata information corresponding to the second data block to indicate that a second compression manner of the second data block is the combination compression. 9. The data processing method of claim 8 , wherein after combining and compressing the first data block and the second data block, the data processing method further comprises: adding a first location identifier to the first metadata information to indicate a first location of the first data block in the target data block; and adding a second location identifier to the second metadata information to indicate a second location of the second data block in the target data block. 10. The data processing method of claim 1 , after combining and compressing the first data block and the second data block, the method further comprising: determining whether a data length of a combined and compressed target data block exceeds a storage granularity; when determining that the data length of the combined and compressed target data block exceeds the storage granularity, splitting the combined and compressed target data block into several granularities based on a granularity unit and adding a flag to an end of each segment of data to identify consecutive data block address; and when determining that the data length of the combined and compressed target data block is less than the storage granularity, adding 0 to an end of the combined and compressed target data block. 11. A data processing apparatus comprising: a communications interface; and a processor coupled to the communications interface and configured to execute instructions stored in a memory to cause the data processing apparatus to: obtain a plurality of data blocks; determine a first data block and a second data block from the data blocks, wherein the first data block has a first hash value that is based on a first calculation on the first data block based on a hash algorithm, and wherein the second data block has a second hash value that is based on a second calculation on the second data block based on the hash algorithm; determine that the first data block and the second data block meet a similarity condition based on a degree of similarity of the first hash value and the second hash value; determine whether a data reduction ratio corresponding to a target data block to be obtained by combining and compressing the first data block and the second data block reaches a reduction ratio threshold; when determining that the data reduction ratio reaches the reduction ratio threshold, combine and compress the first data block and the second data block using a first compression algorithm; and when determining that the data reduction ratio does not reach the reduction ratio threshold, separately compress the first data block and the second data block using a second compression algorithm different from the first compression algorithm. 12. The data processing apparatus of claim 11 , wherein the hash algorithm is a locality-sensitive hash algorithm. 13. The data processing apparatus of claim 12 , wherein the processor further causes the data processing apparatus to: segment the first data block into a plurality of data sub-blocks of different lengths; calculate a hash value of each of the data sub-blocks; perform combination calculation on hash values of the data sub-blocks to obtain a locality-sensitive hash value corresponding to the first data block; and set the locality-sensitive hash value as the first hash value. 14. The data processing apparatus of claim 13 , wherein the processor further causes the data processing apparatus to: identify that a Jaccard distance between the first hash value and the second hash value is less than a first distance threshold; identify that a Euclidean distance between the first hash value and the second hash value is less than a second distance threshold; or identify that a Hamming distance between the first hash value and the second hash value is less than a third distance threshold. 15. The data processing apparatus of claim 11 , wherein the processor further causes the data processing apparatus to: identify that a difference between the first hash value and the second hash value is less than a similarity threshold; and obtain the degree of similarity based on the difference between the first hash value and the second hash value. 16. The data processing apparatus of claim 11 , wherein after combining and compressing the first data block and the second data block, the pro

Assignees

Huawei Tech Co Ltd

Inventors

Classifications

G06F3/064Primary
Management of blocks · CPC title
G06F3/0608Primary
Saving storage space on storage systems · CPC title
G06F3/0673
Single storage device · CPC title
H03M7/30
Compression (speech analysis-synthesis for redundancy reduction G10L19/00; for image communication H04N); Expansion; Suppression of unnecessary data, e.g. redundancy reduction · CPC title
G06F3/0685
Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays · CPC title

Patent family

Related publications grouped by family.

View patent family 73748605

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11797204B2 cover?: A data processing method includes obtaining a plurality of data blocks, determining a first data block and a second data block from the data blocks, where the first data block has a first hash value, and the second data block has a second hash value, where the first hash value is obtained by performing calculation on the first data block based on a hash algorithm and the second hash value is ob…
Who is the assignee on this patent?: Huawei Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06F3/064. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).