Systems and methods for rebuilding a cache index
US-10331561-B1 · Jun 25, 2019 · US
US11789917B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11789917-B2 |
| Application number | US-202217583365-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 25, 2022 |
| Priority date | Jan 25, 2022 |
| Publication date | Oct 17, 2023 |
| Grant date | Oct 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A storage control system receives a first data block to be written to a primary storage, and generates a content signature for the first data block. The storage control system adds a first entry for the first data block into a persistent deduplication database. The first entry comprises a key which comprises the content signature for the first data block. The persistent deduplication database comprises a tree data structure which comprises elements that are configured to store entries for data blocks. The storage control system merges the entries of at least two elements of the tree data structure to generate a set of merged entries which comprises the first entry for the first data block, and a second entry for a second data block, and commences a deduplication process in response to determining that the first entry and the second entry in the set of merged entries have matching keys.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, by a storage control system, a first data block to be written to a primary storage; generating, by the storage control system, a content signature for the first data block; adding, by the storage control system, a first entry for the first data block into a persistent deduplication database, wherein the first entry for the first data block comprises a key which comprises the content signature for the first data block, and wherein the persistent deduplication database comprises a tree data structure which comprises elements that are configured to store entries for data blocks; merging, by the storage control system, the entries of at least two elements of the tree data structure to generate a set of merged entries which comprises the first entry for the first data block; determining, by the storage control system, whether the set of merged entries includes a second data block having a second entry with a key that matches the key of the first entry of the first data block; and commencing, by the storage control system, a deduplication process to determine if the first entry and the second entry correspond to duplicate data blocks, in response to determining that the first entry and the second entry in the set of merged entries have matching keys. 2. The method of claim 1 , wherein the tree data structure comprises a log-structured merge tree data structure, and wherein the elements comprise segments of the log-structured merge tree. 3. The method of claim 1 , wherein the tree data structure comprises a B-epsilon tree data structure, and wherein the elements comprise message buffers of internal nodes of the B-epsilon tree data structure. 4. The method of claim 1 , further comprising assigning, by the storage control system, a first virtual address for a logical address of the first data block, wherein the first entry further comprises the first virtual address and the logical address of the first data block, wherein the content signature and the first virtual address comprise a key:value pair for the first entry. 5. The method of claim 4 , further comprising: performing, by the storage control system, the deduplication process to determine whether the first data block comprises a same or similar content as the second data block; and in response to determining that the first data block comprises the same or similar content as the second data block: utilizing, by the storage control system, the logical address included in the first entry for the first data block, and a second virtual address included in the second entry for the second data block to update a logical-to-virtual pointer from the logical address of the first data block to the second virtual address included in the second entry for the second data block; and incrementing, by the storage control system, a reference count for the second virtual address. 6. The method of claim 1 , wherein generating the content signature for the first data block comprises generating a cryptographic hash value for content of the first data block. 7. The method of claim 1 , further comprising: writing, by the storage control system, the first data block to a persistent write cache; returning, by the storage control system, an acknowledge message to indicate that the first data block is written to the primary storage, in response to the first data block being stored in the persistent write cache; and destaging, by the storage control system, the first data block from the persistent write cache without writing the first data block to the primary storage, in response to the deduplication process validating that the first data block is a duplicate of existing data block in the primary storage. 8. The method of claim 1 , further comprising: searching, by the storage control system, a cached deduplication database to determine if the cached deduplication database comprises a cached entry having a content signature which matches the content signature of the first data block; in response to the storage control system determining that there is no cached entry having a content signature which matches the content signature of the first data block, the storage control system adds the first entry for the first data block into the persistent deduplication database; and in response to the storage control system identifying a given cached entry having a content signature which matches the content signature of the first data block, the storage control system performing an inline deduplication process to determine whether the first data block comprises a same or similar content as a third data block which is associated with the given cached entry. 9. An article of manufacture comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code is executable by one or more processors to implement a method which comprises: receiving, by a storage control system, a first data block to be written to a primary storage; generating, by the storage control system, a content signature for the first data block; adding, by the storage control system, a first entry for the first data block into a persistent deduplication database, wherein the first entry for the first data block comprises a key which comprises the content signature for the first data block, and wherein the persistent deduplication database comprises a tree data structure which comprises elements that are configured to store entries for data blocks; merging, by the storage control system, the entries of at least two elements of the tree data structure to generate a set of merged entries which comprises the first entry for the first data block; determining, by the storage control system, whether the set of merged entries includes a second data block having a second entry with a key that matches the key of the first entry of the first data block; and commencing, by the storage control system, a deduplication process to determine if the first entry and the second entry correspond to duplicate data blocks, in response to determining that the first entry and the second entry in the set of merged entries have matching keys. 10. The article of manufacture of claim 9 , wherein the tree data structure comprises a log-structured merge tree data structure, and wherein the elements comprise segments of the log-structured merge tree. 11. The article of manufacture of claim 9 , wherein the tree data structure comprises a B-epsilon tree data structure, and wherein the elements comprise message buffers of internal nodes of the B-epsilon tree data structure. 12. The article of manufacture of claim 9 , further comprising program code which is executable by the one or more processors to implement a method which comprises assigning, by the storage control system, a first virtual address for a logical address of the first data block, wherein the first entry further comprises the first virtual address and the logical address of the first data block, wherein the content signature and the first virtual address comprise a key:value pair for the first entry. 13. The article of manufacture of claim 12 , further comprising program code which is executable by the one or more processors to implement a method which comprises: performing, by the storage control system, the deduplication process to determine whether the first data block comprises a same or similar content as the second data block; and in response to determining that the first data block comprises the same or similar content as the second data block: utilizing, by the storage control system
Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title
Trees, e.g. B+trees · CPC title
De-duplication implemented within the file system, e.g. based on file segments (de-duplication techniques in storage systems for the management of data blocks G06F3/0641) · CPC title
De-duplication techniques · CPC title
Caching, prefetching or hoarding of files · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.