System and method for efficient background deduplication during hardening

US11720484B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720484-B2
Application numberUS-202016940952-A
CountryUS
Kind codeB2
Filing dateJul 28, 2020
Priority dateJul 28, 2020
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and computer system for identifying, by a computing device, content in a first bucket in a first cache. It may be determined that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket may be unique. The first portion of the content in the first bucket may be deduplicated from the first cache. The second portion of the content may be stored in a second bucket in a second cache.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying, by a computing device, content in a first bucket in a first cache level of a multi-level cache system; determining that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket is unique; hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system; deduplicating, during the hardening, the first portion of the content in the first bucket from the first cache level; and storing, during the hardening, the second portion of the content in a second bucket in the second cache level, wherein a capacity of the second cache level is larger than a capacity of the first cache level. 2. The computer-implemented method of claim 1 wherein the first cache level of the multi-level cache system is in-memory. 3. The computer-implemented method of claim 1 wherein the second cache level of the multi-level cache system is in persistent storage. 4. The computer-implemented method of claim 1 wherein deduplicating the first portion of the content in the first bucket from the first cache level is based upon, at least in part, a threshold workload. 5. The computer-implemented method of claim 1 wherein deduplicating the first portion of the content in the first bucket from the first cache level includes identifying the first portion of the content in a log of potential deduplication candidates. 6. The computer-implemented method of claim 5 wherein deduplicating the first portion of the content in the first bucket from the first cache level further includes scanning the log of potential deduplication candidates to identify the first portion of the content in the first bucket as the duplicate. 7. The computer implemented method of claim 1 , wherein: hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system, includes reading a first bucket in the second cache level, the first bucket in second cache level corresponding to the first bucket in the first cache level; and deduplicating, during the hardening, the first portion of the content in the first bucket from the first cache level, includes performing a lookup of the first portion and the second portion of the content relative to the read first bucket in the second cache level. 8. The computer implemented method of claim 1 , wherein hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system includes determining a threshold level of fullness in the first bucket of the first cache level has been reached. 9. A computer program product residing on a computer readable storage medium having a plurality of instructions stored thereon which, when executed across one or more processors, causes at least a portion of the one or more processors to perform operations comprising: identifying, by a computing device, content in a first bucket in a first cache level of a multi-level cache system; determining that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket is unique; hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system; deduplicating, during the hardening, the first portion of the content in the first bucket from the first cache level; and storing, during the hardening, the second portion of the content in a second bucket in the second cache level, wherein a capacity of the second cache level is larger than a capacity of the first cache level. 10. The computer program product of claim 9 wherein the first cache level of the multi-level cache system is in-memory. 11. The computer program product of claim 9 wherein the second cache level of the multi-level cache system is in persistent storage. 12. The computer program product of claim 9 wherein deduplicating the first portion of the content in the first bucket from the first cache level is based upon, at least in part, a threshold workload. 13. The computer program product of claim 9 wherein deduplicating the first portion of the content in the first bucket from the first cache level includes identifying the first portion of the content in a log. 14. The computer program product of claim 13 wherein deduplicating the first portion of the content in the first bucket from the first cache level further includes scanning the log to identify the first portion of the content in the first bucket as the duplicate. 15. The computer program produce of claim 9 , wherein: hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system, includes reading a first bucket in the second cache level, the first bucket in second cache level corresponding to the first bucket in the first cache level; and deduplicating, during the hardening, the first portion of the content in the first bucket from the first cache level, includes performing a lookup of the first portion and the second portion of the content relative to the read first bucket in the second cache level. 16. A computing system including one or more processors and one or more memories configured to perform operations comprising: identifying, by a computing device, content in a first bucket in a first cache level of a multi-level cache system; determining that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket is unique; hardening the first portion and the second portion of the content to a second cache level of the multi-level cache system, including reading a first bucket in the second cache level, the first bucket in second cache level corresponding to the first bucket in the first cache level; deduplicating, during the hardening, the first portion of the content in the first bucket from the first cache level, including performing a lookup of the first portion and the second portion of the content relative to the read first bucket in the second cache level; and storing, during the hardening, the second portion of the content in a second bucket in the second cache level, wherein a capacity of the second cache level is larger than a capacity of the first cache level. 17. The computing system of claim 16 wherein the first cache level of the multi-level cache system is in-memory. 18. The computing system of claim 16 wherein deduplicating the first portion of the content in the first bucket from the first cache level is based upon, at least in part, a threshold workload. 19. The computing system of claim 16 wherein deduplicating the first portion of the content in the first bucket from the first cache level includes identifying the first portion of the content in a log. 20. The computing system of claim 19 wherein deduplicating the first portion of the content in the first bucket from the first cache level further includes scanning the log to identify the first portion of the content in the first bucket as the duplicate.

Assignees

Inventors

Classifications

  • G06F12/023Primary

    Free address space management · CPC title

  • with two or more cache hierarchy levels (with multilevel cache hierarchies G06F12/0811) · CPC title

  • Data transfer between cache memory and other subsystems, e.g. storage devices or host systems · CPC title

  • Structured object, e.g. database record · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720484B2 cover?
A method, computer program product, and computer system for identifying, by a computing device, content in a first bucket in a first cache. It may be determined that a first portion of the content in the first bucket is a duplicate, wherein a second portion of the content in the first bucket may be unique. The first portion of the content in the first bucket may be deduplicated from the first c…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F12/023. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).