Techniques for automatically freeing space in a log-structured storage system based on segment fragmentation
US-9778881-B2 · Oct 3, 2017 · US
US10614036B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10614036-B1 |
| Application number | US-201615394376-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 29, 2016 |
| Priority date | Dec 29, 2016 |
| Publication date | Apr 7, 2020 |
| Grant date | Apr 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques have been provided for storing data in a de-duplicated manner on a data storage system in a manner that allows for real-time reference to an index that is too large to fit within memory. This may be accomplished by segmenting the index into smaller segments, stored on disk. Only a subset of the segments may be loaded into memory at a given time. A predictive filter is stored in memory for each segment, allowing a de-duplication driver to quickly predict whether any given new block is likely to be indexed by each segment. Since identical blocks are often stored in long identical sequences (e.g., upon copying a disk image to a disk for a virtual machine), once a segment stored on disk is referenced many times in a short period, it is loaded into memory to allow the remainder of the long sequence to be de-duplicated.
Opening claim text (preview).
What is claimed is: 1. A method of storing data in a de-duplicated manner on a data storage system, the method comprising: receiving a block of data to be stored to a logical address of the data storage system; filtering the block of data against each of a plurality of predictive filters to predict whether the block of data is likely to be indexed on each respective segment of a plurality of segments of a de-duplication index, the plurality of segments being partitioned into a first set of segments stored within system memory of the data storage system and a second set of segments not stored within system memory of the data storage system, each of the plurality of predictive filters, including the predictive filters corresponding to the segments of the second set, being stored within system memory, wherein filtering includes reading the plurality of predictive filters from system memory; in response to determining that the block of data is predicted to likely be indexed by each of a subset of the segments stored within system memory, checking whether the block of data is actually indexed by any of the segments of the subset; and in response to determining that the block of data is actually indexed by a particular segment of the subset of the segments stored within system memory, the particular segment indicating a location on the data storage system where the data block was previously stored: causing the logical address of the data storage system to point to the indicated location on the data storage system where the data block was previously stored; and refraining from writing the received block of data to the logical address of the data storage system, wherein filtering the block of data against each of the plurality of predictive filters includes updating a respective counter for each segment based on a prediction result of the predictive filter corresponding to that segment, and wherein the method further comprises swapping a set of the segments out of system memory based at least in part on the respective counters of the set of the segments. 2. The method of claim 1 wherein filtering the block of data against each of the plurality of predictive filters further includes: hashing the block of data using a predetermined hash function to yield a hashed value; and applying each predictive filter to the hashed value to yield a binary prediction about whether or not the block of data is likely to be indexed on a segment corresponding to that predictive filter. 3. The method of claim 2 wherein: filtering the block of data against each of the plurality of predictive filters further includes updating the respective counter for each segment based on whether the predictive filter corresponding to that segment yielded an affirmative or negative prediction value; and the method further comprises: comparing the respective counters for each segment of the plurality of segments; swapping a first segment from the second set of segments into system memory; and swapping a second segment from the first set of segments out of system memory, the second segment having a smaller counter than the first segment. 4. The method of claim 3 wherein updating the respective counter for each segment based on whether the predictive filter corresponding to that segment yielded an affirmative or negative prediction value includes: incrementing the respective counter for segments whose predictive filter yielded an affirmative prediction value; and decrementing the respective counter for segments whose predictive filter yielded a negative prediction value. 5. The method of claim 3 wherein the method further comprises: comparing the respective counters for each segment of the plurality of segments to a threshold minimum; and in response to determining that a counter for a third segment has a value smaller than the threshold minimum, deleting the third segment from the data storage system. 6. The method of claim 2 wherein each predictive filter is a Bloom filter. 7. The method of claim 2 wherein the method further comprises: receiving another block of data to be stored to another logical address of the data storage system; hashing the other block of data using the predetermined hash function to yield another hashed value; applying each predictive filter to the other hashed value to yield another binary prediction about whether or not the other block of data is likely to be indexed on each segment corresponding to that predictive filter; and in response to determining that the other block of data is not predicted to be indexed by any segment of the first set of segments stored within system memory: writing the other block of data to a new location on the data storage system; adding the other hashed value and a pointer to the new location to an open segment of the first set of segments stored within system memory; and updating the predictive filter for the open segment to account for the other hashed value having been added. 8. The method of claim 7 wherein the method further comprises: in response to adding the other hashed value and the pointer to the new location to the open segment, determining that the open segment has become full; in response to determining that the open segment has become full, closing the open segment and creating a new open segment as part of the de-duplication index within the first set stored within system memory. 9. The method of claim 2 wherein the method further comprises: receiving another block of data to be stored to another logical address of the data storage system; hashing the other block of data using the predetermined hash function to yield another hashed value; applying each predictive filter to the other hashed value to yield another binary prediction about whether or not the other block of data is likely to be indexed on each segment corresponding to that predictive filter; in response to determining that the other block of data is predicted to likely be indexed by each of a subset of the segments stored within system memory, checking whether the other block of data is actually indexed by any of the segments of the subset; in response to determining that the block of data is not actually indexed by any segment of the subset of the segments stored within system memory: writing the other block of data to a new location on the data storage system; adding the other hashed value and a pointer to the new location to an open segment of the first set of segments stored within system memory; and updating the predictive filter for the open segment to account for the other hashed value having been added. 10. The method of claim 9 wherein the method further comprises: in response to adding the other hashed value and the pointer to the new location to the open segment, determining that the open segment has become full; in response to determining that the open segment has become full, closing the open segment and creating a new open segment as part of the de-duplication index within the first set stored within system memory. 11. The method of claim 1 wherein receiving, filtering, determining that the block of data is predicted to likely be indexed by each of the subset of the segments stored within system memory, and checking are performed without reading from persistent storage of the data storage system. 12. An apparatus comprising: persistent storage devices providing data storage; network interface circuitry configured to communicate with a host over a network; and processing circuitry coupled to memory to form a control circuit configured to store data in a de-duplicated manner in data storage by: receiving a b
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
Hash-based (content-based indexing of textual data G06F16/31) · CPC title
for networked environments · CPC title
Database-specific techniques · CPC title
using de-duplication of the data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.