What technology area does this patent fall under?

Primary CPC classification G06F3/0655. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Reducing memory usage in storing metadata

US11797220B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11797220-B2
Application number	US-202117408007-A
Country	US
Kind code	B2
Filing date	Aug 20, 2021
Priority date	Aug 20, 2021
Publication date	Oct 24, 2023
Grant date	Oct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to include one or more entries corresponding to one or more determined duplicate data chunks.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: ingesting data from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure, wherein the first data structure includes a first plurality of entries that includes a first entry, wherein the first entry of the first data structure corresponds to a first chunk file of the one or more chunk files and associates a chunk file identifier of the first chunk file with a first set of one or more chunk identifiers associated with a first set of one or more data chunks stored in the first chunk file, wherein the first entry includes a corresponding offset for the first set of one or more chunk identifiers within the first chunk file; and after data ingestion is complete, determining one or more duplicate data chunks that were stored during the data ingestion and updating a second data structure to include one or more entries corresponding to the one or more determined duplicate data chunks, wherein the second data structure is comprised of a second plurality of entries, wherein each of the second plurality of entries associates a corresponding chunk identifier of a stored data chunk with a corresponding chunk file identifier of a chunk file storing the stored data chunk, wherein determining the one or more duplicate data chunks that were stored during the data ingestion includes identifying a threshold number of entries associated with the first data structure that include a first chunk identifier included in the first set of one or more chunk identifiers and updating the second data structure to include a new entry that associates the first chunk identifier corresponding to a first data chunk with the first chunk file storing the first data chunk. 2. The method of claim 1 , wherein the plurality of data chunks are variable sized data chunks. 3. The method of claim 1 , wherein the first data structure and the second data structure are stored in a memory of a storage system. 4. The method of claim 1 , wherein ingesting the data from the source system includes generating a tree data structure that enables the plurality of data chunks to be located. 5. The method of claim 1 , wherein ingesting the data from the source system includes generating the corresponding chunk identifiers for each of the plurality of data chunks. 6. The method of claim 1 , wherein determining the one or more duplicate data chunks that were stored during the data ingestion includes: selecting the first entry of the first data structure; and determining whether the first chunk identifier associated with the first entry is a same chunk identifier associated with a threshold number of other entries of the first data structure. 7. The method of claim 6 , wherein determining the one or more duplicate data chunks that were stored during the data ingestion further includes updating the second data structure to include the first entry that associates the first chunk identifier corresponding to the first data chunk with the first chunk file storing the first data chunk in response to determining that the first chunk identifier associated with the first entry is the same chunk identifier associated with the threshold number of other entries of the first data structure. 8. The method of claim 7 , further comprising deleting the first data chunk corresponding to the first chunk identifier associated with the first entry from one or more chunk files corresponding to the threshold number of other entries. 9. The method of claim 8 , further comprising updating the other entries to unreference the first chunk identifier associated with the first data chunk. 10. The method of claim 1 , wherein determining the one or more duplicate data chunks that were stored during the data ingestion includes: selecting a second entry of the first data structure; and determining whether a corresponding chunk identifier associated with the selected second entry is a same chunk identifier associated with a threshold number of other entries of the first data structure. 11. The method of claim 10 , wherein determining the one or more duplicate data chunks that were stored during the data ingestion further includes modifying the chunk identifier associated with the selected second entry of the first data structure to be a different chunk identifier in response to determining that the chunk identifier associated with the selected second entry of the first data structure is not the same chunk identifier associated with the threshold number of other entries of the first data structure. 12. The method of claim 11 , further comprising updating the selected second entry of the first data structure to reference the different chunk identifier in place of the chunk identifier associated with the selected second entry of the first data structure. 13. The method of claim 12 , further comprising updating a node of a tree data structure that references the chunk identifier associated with the selected second entry to reference the different chunk identifier. 14. A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: ingesting data from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure, wherein the first data structure includes a first plurality of entries that includes a first entry, wherein the first entry of the first data structure corresponds to a first chunk file of the one or more chunk files and associates a chunk file identifier of the first chunk file with a first set of one or more chunk identifiers associated with a first set of one or more data chunks stored in the first chunk file, wherein the first entry includes a corresponding offset for the first set of one or more chunk identifiers within the first chunk file; and after data ingestion is complete, determining one or more duplicate data chunks that were stored during the data ingestion and updating a second data structure to include one or more entries corresponding to the one or more determined duplicate data chunks, wherein the second data structure is comprised of a second plurality of entries, wherein each of the second plurality of entries associates a corresponding chunk identifier of a stored data chunk with a corresponding chunk file identifier of a chunk file storing the stored data chunk, wherein determining the one or more duplicate data chunks that were stored during the data ingestion includes identifying a threshold number of entries associated with the first data structure that include a first chunk identifier included in the first set of one or more chunk identifiers and updating the second data structure to include a new entry that associates the first chunk identifier corresponding to a first data chunk with the first chunk file storing the first data chunk. 15. The computer program product of claim 14 , wherein the plurality of data chunks are variable sized data chunks. 16. A system, comprising: one or more processors configured to: ingest data from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure, wherein the first data structure includes a first plurality of entries that includes a first entry, wherein the first entry of the first data structur

Assignees

Cohesity Inc

Inventors

Classifications

G06F3/0655Primary
Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices · CPC title
G06F3/0608
Saving storage space on storage systems · CPC title
G06F3/0652
Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket · CPC title
G06F3/0679
Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP] · CPC title
G06F3/0641Primary
De-duplication techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 85229391

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11797220B2 cover?: Data is ingested from a source system including by storing a plurality of data chunks in one or more chunk files and storing corresponding chunk identifiers associated with the plurality of data chunks in a first data structure. After data ingestion is complete, one or more duplicate data chunks that were stored during the data ingestion are determined and a second data structure is updated to …
Who is the assignee on this patent?: Cohesity Inc
What technology area does this patent fall under?: Primary CPC classification G06F3/0655. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).