What technology area does this patent fall under?

Primary CPC classification G06F3/0608. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and methods for storage data deduplication

US11307765B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11307765-B2
Application number	US-201916356818-A
Country	US
Kind code	B2
Filing date	Mar 18, 2019
Priority date	Jul 27, 2015
Publication date	Apr 19, 2022
Grant date	Apr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data in a storage system is deduplicated after receiving from at least one writing entity requests for a plurality of write operations for a corresponding plurality of data blocks in a storage object. The received blocks are buffered and sorted in order and a sequence of clumps is created from the buffered blocks, where each clump comprises a grouping of at least one of the sorted, buffered blocks. A boundary is determined between at least one pair of clumps based at least in part on the content of at least one of the buffered blocks, and it is then determined whether at least one of the clumps is a duplicate of a previously stored clump.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for managing data in a storage system comprising: receiving, from at least one writing entity, requests for a plurality of write operations; wherein the plurality of write operations are for writing a plurality of received blocks to storage; wherein each block of the plurality of received blocks is associated with a Logical Block Address (LBA); creating a sequence of clumps from the plurality of received blocks, wherein each clump comprises a group of blocks, and wherein a first clump includes a first group of blocks in which the LBAs of the blocks are not contiguous in LBA space, wherein creating the sequence of clumps comprises: compressing each of the plurality of received blocks, sorting the plurality of compressed blocks in LBA order, and packing groups of blocks into corresponding clumps; creating a data structure for storing LBA-to-clump mappings; for each block of the plurality of received blocks, storing a corresponding entry in the data structure; within each entry in the data structure, storing: the LBA that is associated with the received block to which the entry corresponds, and a reference tuple that includes a clump identifier for the clump that stores data for the corresponding LBA of the entry and a position of the block within the clump; and after creating the data structure, using entries in the data structure to determine the clumps that store data for the received blocks. 2. The method of claim 1 wherein: the plurality of received blocks includes a particular block; the particular block is associated with a particular LBA; within the sequence of clumps, data for the particular block is stored in a particular clump; storing a corresponding entry in the data structure includes storing, within the data structure, a particular entry for the particular LBA; wherein the particular entry includes the particular LBA and a particular reference; and the particular reference includes: a particular clump identifier that identifies the particular clump; and a particular block position that indicates a position, within the particular clump, of the data that corresponds to the particular LBA. 3. The method of claim 2 wherein: the particular reference is a first reference associated with a first snapshot; the particular clump was written to storage during the first snapshot; the particular entry includes a second reference associated with a second snapshot; new data associated with the particular LBA was written to storage during the second snapshot; and the second reference includes: a second clump identifier that identifies a second clump that contains the new data; and a second block position that indicates a position, within the second clump, of the new data. 4. The method of claim 3 wherein the particular entry includes an indication of a snapshot ID of a snapshot during which the particular reference was created. 5. The method of claim 1 , in which the clump identifier stored in each reference is a clump fingerprint. 6. The method of claim 5 , in which the fingerprint is a hash value of the contents of the clump. 7. The method of claim 3 , further comprising scanning entries of the data structure to identify, as dead data blocks, those data blocks that are referenced by stale references, wherein a stale reference is a reference that (a) is associated with an LBA whose entry includes multiple references; and (b) is associated with a snapshot that is older than at least one other snapshot identified in the entry for the LBA. 8. The method of claim 7 , further comprising repacking clumps so as to retain only clumps having no dead data blocks. 9. The method of claim 1 , further comprising storing the data structure as a Log-Structured Merge tree. 10. One or more non-transitory computer-readable media storing instructions for managing data in a storage system, wherein the instructions, when executed by one or more processors, cause: receiving, from at least one writing entity, requests for a plurality of write operations; wherein the plurality of write operations are for writing a plurality of received blocks to storage; wherein each block of the plurality of received blocks is associated with a Logical Block Address (LBA); creating a sequence of clumps from the plurality of received blocks, wherein each clump comprises a group of blocks, and wherein a first clump includes a first group of blocks in which the LBAs of the blocks are not contiguous in LBA space, wherein creating the sequence of clumps comprises: compressing each of the plurality of received blocks; sorting the plurality of compressed blocks in LBA order; and packing groups of blocks into corresponding clumps; creating a data structure for storing LBA-to-clump mappings; for each block of the plurality of received blocks, storing a corresponding entry in the data structure; within each entry in the data structure, storing: the LBA that is associated with the received block to which the entry corresponds, and a reference tuple that includes a clump identifier for the clump that stores data for the corresponding LBA of the entry and a position of the block within the clump; and after creating the data structure, using entries in the data structure to determine the clumps that store data for the received blocks. 11. The one or more non-transitory computer-readable media of claim 10 wherein: the plurality of received blocks includes a particular block; the particular block is associated with a particular LBA; within the sequence of clumps, data for the particular block is stored in a particular clump; storing a corresponding entry in the data structure includes storing, within the data structure, a particular entry for the particular LBA; wherein the particular entry includes the particular LBA and a particular reference; and the particular reference includes: a particular clump identifier that identifies the particular clump; and a particular block position that indicates a position, within the particular clump, of the data that corresponds to the particular LBA. 12. The one or more non-transitory computer-readable media of claim 11 wherein: the particular reference is a first reference associated with a first snapshot; the particular clump was written to storage during the first snapshot; the particular entry includes a second reference associated with a second snapshot; new data associated with the particular LBA was written to storage during the second snapshot; and the second reference includes: a second clump identifier that identifies a second clump that contains the new data; and a second block position that indicates a position, within the second clump, of the new data. 13. The one or more non-transitory computer-readable media of claim 12 wherein the particular entry includes an indication of a snapshot ID of a snapshot during which the particular reference was created. 14. The one or more non-transitory computer-readable media of claim 10 , in which the clump identifier stored in each reference is a clump fingerprint. 15. The one or more non-transitory computer-readable media of claim 14 , in which the fingerprint is a hash value of the contents of the clump. 16. The one or more non-transitory computer-readable media of claim 12 , wherein execution of the instructions further causes scanning entries of the data structure to identify, as dead data blocks, those data blocks that are referenced by stale references, wherein a stale reference is a reference that (a) is associated with an L

Assignees

Vmware Inc

Inventors

Classifications

G06F3/0608Primary
Saving storage space on storage systems · CPC title
G06F3/0641
De-duplication techniques · CPC title
G06F3/06Primary
Digital input from, or digital output to, record carriers {, e.g. RAID, emulated record carriers or networked record carriers} · CPC title
G06F16/285
Clustering or classification · CPC title
G06F16/22
Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

View patent family 57882811

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11307765B2 cover?: Data in a storage system is deduplicated after receiving from at least one writing entity requests for a plurality of write operations for a corresponding plurality of data blocks in a storage object. The received blocks are buffered and sorted in order and a sequence of clumps is created from the buffered blocks, where each clump comprises a grouping of at least one of the sorted, buffered blo…
Who is the assignee on this patent?: Vmware Inc
What technology area does this patent fall under?: Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).