Who is the assignee on this patent?

Emc Ip Holding Co Llc, AMC IP Holding Company LLC

What technology area does this patent fall under?

Primary CPC classification G06F16/1748. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Techniques for de-duplicating data storage systems using a segmented index

US10614036B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10614036-B1
Application number	US-201615394376-A
Country	US
Kind code	B1
Filing date	Dec 29, 2016
Priority date	Dec 29, 2016
Publication date	Apr 7, 2020
Grant date	Apr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques have been provided for storing data in a de-duplicated manner on a data storage system in a manner that allows for real-time reference to an index that is too large to fit within memory. This may be accomplished by segmenting the index into smaller segments, stored on disk. Only a subset of the segments may be loaded into memory at a given time. A predictive filter is stored in memory for each segment, allowing a de-duplication driver to quickly predict whether any given new block is likely to be indexed by each segment. Since identical blocks are often stored in long identical sequences (e.g., upon copying a disk image to a disk for a virtual machine), once a segment stored on disk is referenced many times in a short period, it is loaded into memory to allow the remainder of the long sequence to be de-duplicated.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of storing data in a de-duplicated manner on a data storage system, the method comprising: receiving a block of data to be stored to a logical address of the data storage system; filtering the block of data against each of a plurality of predictive filters to predict whether the block of data is likely to be indexed on each respective segment of a plurality of segments of a de-duplication index, the plurality of segments being partitioned into a first set of segments stored within system memory of the data storage system and a second set of segments not stored within system memory of the data storage system, each of the plurality of predictive filters, including the predictive filters corresponding to the segments of the second set, being stored within system memory, wherein filtering includes reading the plurality of predictive filters from system memory; in response to determining that the block of data is predicted to likely be indexed by each of a subset of the segments stored within system memory, checking whether the block of data is actually indexed by any of the segments of the subset; and in response to determining that the block of data is actually indexed by a particular segment of the subset of the segments stored within system memory, the particular segment indicating a location on the data storage system where the data block was previously stored: causing the logical address of the data storage system to point to the indicated location on the data storage system where the data block was previously stored; and refraining from writing the received block of data to the logical address of the data storage system, wherein filtering the block of data against each of the plurality of predictive filters includes updating a respective counter for each segment based on a prediction result of the predictive filter corresponding to that segment, and wherein the method further comprises swapping a set of the segments out of system memory based at least in part on the respective counters of the set of the segments. 2. The method of claim 1 wherein filtering the block of data against each of the plurality of predictive filters further includes: hashing the block of data using a predetermined hash function to yield a hashed value; and applying each predictive filter to the hashed value to yield a binary prediction about whether or not the block of data is likely to be indexed on a segment corresponding to that predictive filter. 3. The method of claim 2 wherein: filtering the block of data against each of the plurality of predictive filters further includes updating the respective counter for each segment based on whether the predictive filter corresponding to that segment yielded an affirmative or negative prediction value; and the method further comprises: comparing the respective counters for each segment of the plurality of segments; swapping a first segment from the second set of segments into system memory; and swapping a second segment from the first set of segments out of system memory, the second segment having a smaller counter than the first segment. 4. The method of claim 3 wherein updating the respective counter for each segment based on whether the predictive filter corresponding to that segment yielded an affirmative or negative prediction value includes: incrementing the respective counter for segments whose predictive filter yielded an affirmative prediction value; and decrementing the respective counter for segments whose predictive filter yielded a negative prediction value. 5. The method of claim 3 wherein the method further comprises: comparing the respective counters for each segment of the plurality of segments to a threshold minimum; and in response to determining that a counter for a third segment has a value smaller than the threshold minimum, deleting the third segment from the data storage system. 6. The method of claim 2 wherein each predictive filter is a Bloom filter. 7. The method of claim 2 wherein the method further comprises: receiving another block of data to be stored to another logical address of the data storage system; hashing the other block of data using the predetermined hash function to yield another hashed value; applying each predictive filter to the other hashed value to yield another binary prediction about whether or not the other block of data is likely to be indexed on each segment corresponding to that predictive filter; and in response to determining that the other block of data is not predicted to be indexed by any segment of the first set of segments stored within system memory: writing the other block of data to a new location on the data storage system; adding the other hashed value and a pointer to the new location to an open segment of the first set of segments stored within system memory; and updating the predictive filter for the open segment to account for the other hashed value having been added. 8. The method of claim 7 wherein the method further comprises: in response to adding the other hashed value and the pointer to the new location to the open segment, determining that the open segment has become full; in response to determining that the open segment has become full, closing the open segment and creating a new open segment as part of the de-duplication index within the first set stored within system memory. 9. The method of claim 2 wherein the method further comprises: receiving another block of data to be stored to another logical address of the data storage system; hashing the other block of data using the predetermined hash function to yield another hashed value; applying each predictive filter to the other hashed value to yield another binary prediction about whether or not the other block of data is likely to be indexed on each segment corresponding to that predictive filter; in response to determining that the other block of data is predicted to likely be indexed by each of a subset of the segments stored within system memory, checking whether the other block of data is actually indexed by any of the segments of the subset; in response to determining that the block of data is not actually indexed by any segment of the subset of the segments stored within system memory: writing the other block of data to a new location on the data storage system; adding the other hashed value and a pointer to the new location to an open segment of the first set of segments stored within system memory; and updating the predictive filter for the open segment to account for the other hashed value having been added. 10. The method of claim 9 wherein the method further comprises: in response to adding the other hashed value and the pointer to the new location to the open segment, determining that the open segment has become full; in response to determining that the open segment has become full, closing the open segment and creating a new open segment as part of the de-duplication index within the first set stored within system memory. 11. The method of claim 1 wherein receiving, filtering, determining that the block of data is predicted to likely be indexed by each of the subset of the segments stored within system memory, and checking are performed without reading from persistent storage of the data storage system. 12. An apparatus comprising: persistent storage devices providing data storage; network interface circuitry configured to communicate with a host over a network; and processing circuitry coupled to memory to form a control circuit configured to store data in a de-duplicated manner in data storage by: receiving a b

Assignees

Inventors

Classifications

G06F16/1748Primary
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
G06F16/137
Hash-based (content-based indexing of textual data G06F16/31) · CPC title
G06F11/1464
for networked environments · CPC title
G06F2201/80
Database-specific techniques · CPC title
G06F11/1453
using de-duplication of the data · CPC title

Patent family

Related publications grouped by family.

View patent family 70056569

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614036B1 cover?: Techniques have been provided for storing data in a de-duplicated manner on a data storage system in a manner that allows for real-time reference to an index that is too large to fit within memory. This may be accomplished by segmenting the index into smaller segments, stored on disk. Only a subset of the segments may be loaded into memory at a given time. A predictive filter is stored in memor…
Who is the assignee on this patent?: Emc Ip Holding Co Llc, AMC IP Holding Company LLC
What technology area does this patent fall under?: Primary CPC classification G06F16/1748. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Techniques for automatically freeing space in a log-structured storage system based on segment fragmentation

Automated token renewal using OTP-based authentication codes

Range-based data deduplication

Method for operating controller and method for operating device including the same

Multi-level distributed hash table for data storage in a hierarchically arranged network

Frequently asked questions