Methods and systems of managing a distributed replica based storage
US-9514014-B2 · Dec 6, 2016 · US
US11609883B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11609883-B2 |
| Application number | US-201815991380-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 29, 2018 |
| Priority date | May 29, 2018 |
| Publication date | Mar 21, 2023 |
| Grant date | Mar 21, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus in one embodiment comprises at least one processing device comprising a processor coupled to a memory. The processing device is configured to identify a dataset to be scanned to generate a compression estimate for that dataset, to designate a scan criterion to be utilized in the scan, and for each of a plurality of pages of the dataset, to scan the page, where scanning the page includes performing a computation on the page to obtain a page result, determining whether or not the page result satisfies the designated scan criterion, and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset. The processing device generates the compression estimate for the dataset based at least in part on contents of the compression estimate table. The scan criterion may comprise, for example, a designated content-based signature prefix, or a designated subset inclusion characteristic defining a polynomial-based signature subspace.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the processing device being configured: to identify a dataset stored on a first storage system to be scanned to generate a first compression estimate for that dataset; to designate a scan criterion to be utilized in the scan; for each of a plurality of pages of the dataset, to scan the page by: performing a computation on the page to obtain a page result; determining whether or not the page result satisfies the designated scan criterion; and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset; to generate the first compression estimate for the dataset based at least in part on contents of the compression estimate table; to generate a second compression estimate for the dataset based at least in part on the contents of the compression estimate table and enhanced compression functionality available in a second storage system; and to automatically migrate the dataset from the first storage system to the second storage system for compression based at least in part on the second compression estimate indicating that a threshold level of enhanced compression is achieved at the second storage system; wherein scanning the plurality of pages of the dataset comprises sequentially scanning through the plurality of pages of the dataset and applying the designated scan criterion individually to each of the page results of respective ones of the plurality of pages as part of the scanning; wherein the designated scan criterion utilized in the scanning of the plurality of pages of the dataset defines a subspace of a total scan space for the scan; and wherein the designated scan criterion utilized in the scanning of the plurality of pages of the dataset further establishes a sampling ratio of the scanned pages as part of the scanning, based at least in part on the defined subspace of the total scan space for the scan. 2. The apparatus of claim 1 wherein the processing device is implemented in one of: a host device configured to communicate over a network with the first storage system that stores the dataset; and the first storage system that stores the dataset. 3. The apparatus of claim 1 wherein the dataset comprises a set of one or more logical storage volumes of the first storage system. 4. The apparatus of claim 1 wherein the designated scan criterion comprises a designated content-based signature prefix and scanning the page comprises: computing a content-based signature for the page; comparing an initial portion of the content-based signature to the designated content-based signature prefix; and responsive to a match between the initial portion and the designated content-based signature prefix, updating a corresponding entry of the compression estimate table for the dataset. 5. The apparatus of claim 4 wherein the designated content-based signature prefix comprises a specified number of initial content-based signature bytes with the initial bytes each having a designated value. 6. The apparatus of claim 1 wherein the designated scan criterion comprises a designated subset inclusion characteristic and scanning the page comprises: computing a polynomial-based signature for the page; determining whether or not the polynomial-based signature satisfies the designated subset inclusion characteristic; and responsive to the polynomial-based signature satisfying the designated subset inclusion characteristic, computing a content-based signature for the page and updating a corresponding entry of the compression estimate table for the dataset based at least in part on the content-based signature. 7. The apparatus of claim 6 wherein the designated subset inclusion characteristic specifies that application of a designated function to the polynomial-based signature produces a particular result. 8. The apparatus of claim 6 wherein the polynomial-based signature comprises an n-bit cyclic redundancy check (CRC) value. 9. The apparatus of claim 1 wherein updating a corresponding entry of the compression estimate table for a given one of the pages of the dataset comprises one of the following operations (i) and (ii): (i) responsive to a page identifier of the given page not already being present in the compression estimate table, inserting the page identifier into the compression estimate table and setting an associated counter to an initial value; and (ii) responsive to the page identifier already being present in the compression estimate table, incrementing its associated counter. 10. The apparatus of claim 1 wherein the corresponding entry is configured to include a page identifier and further wherein the page identifier comprises a specified number of initial bytes of a content-based signature of that page. 11. The apparatus of claim 1 wherein the compression estimate table for the dataset comprises a plurality of entries for respective ones of the pages of that dataset and wherein each of the entries is configured to include a page identifier that comprises less than an entire content-based signature of its corresponding page. 12. The apparatus of claim 1 wherein generating the first compression estimate for the dataset based at least in part on contents of the compression estimate table further comprises: computing a partial compression estimate based at least in part on compression values associated with respective entries of the compression estimate table; and scaling the partial compression estimate to obtain the first compression estimate for the dataset; wherein scaling the partial compression estimate comprises processing the partial compression estimate utilizing an inverse of the sampling ratio. 13. The apparatus of claim 1 wherein the processing device is configured to adjust one or more characteristics of a storage configuration of the dataset based at least in part on the first compression estimate generated for the dataset. 14. The apparatus of claim 1 wherein the processing device is configured: to generate one or more additional compression estimates for respective ones of one or more additional datasets; and to select a particular one of the datasets for compression based at least in part on their respective compression estimates. 15. A method comprising: identifying a dataset stored on a first storage system to be scanned to generate a first compression estimate for that dataset; designating a scan criterion to be utilized in the scan; for each of a plurality of pages of the dataset, scanning the page by: performing a computation on the page to obtain a page result; determining whether or not the page result satisfies the designated scan criterion; and responsive to the page result satisfying the designated scan criterion, updating a corresponding entry of a compression estimate table for the dataset; generating the first compression estimate for the dataset based at least in part on the contents of the compression estimate table; generating a second compression estimate for the dataset based at least in part on the contents of the compression estimate table and enhanced compression functionality available in a second storage system; and automatically migrating the dataset from the first storage system to the second storage system for compression based at least in part on the second compression estimate indicating that a threshold level of enhanced compression is achieved at the second storage system; wherein scanning the plurality of pages
Organizing or formatting or addressing of data · CPC title
based on delta files · CPC title
hash tables · CPC title
Saving storage space on storage systems · CPC title
In-line storage system · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.