What technology area does this patent fall under?

Primary CPC classification G06F16/178. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Storage and synchronization of metadata in a distributed storage system

US11468015B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11468015-B2
Application number	US-202016919712-A
Country	US
Kind code	B2
Filing date	Jul 2, 2020
Priority date	Dec 7, 2017
Publication date	Oct 11, 2022
Grant date	Oct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A client machine writes to and reads from a virtual disk on a remote storage platform. Metadata is generated and stored in replicas on different metadata nodes of the storage platform. A modified log-structured merge tree is used to store and compact string-sorted tables of metadata. During file storage and compaction, a consistent file identification scheme is used across all metadata nodes. A fingerprint file is calculated for each SST (metadata) file on disk that includes hash values corresponding to regions of the SST file. To synchronize, the fingerprint files of two SST files are compared, and if any hash values are missing from a fingerprint file then the key-value-timestamp triplets corresponding to these missing hash values are sent to the SST file that is missing them. The SST file is compacted with the missing triplets to create a new version of the SST file. The synchronization is bi-directional.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: storing metadata in a first memory block of a first computer node among a plurality of computer nodes in a data storage platform, wherein the metadata comprises information about a mutation of data that occurred in the data storage platform, wherein the mutation comprises a writing of the data into the data storage platform, and wherein the metadata comprises one or more of: whether the data was written successfully or failed to be written into a computer node of the data storage platform, and a name of a virtual disk within the data storage platform where the data was written; flushing the first memory block, including the metadata stored therein, to a table that is stored on disk of the first computer node, wherein the metadata from the first memory block is written into the table as a first key-value-timestamp triplet, wherein the table is organized as a string-sorted table that comprises key-value-timestamp triplets, including the first key-value-timestamp triplet, and wherein the table is sorted by key of the key-value-timestamp triplets therein; calculating a fingerprint value for the table and storing the fingerprint value in the data storage platform; compacting, on disk of the first computer node, the table with other tables comprising metadata to produce a new table that comprises key-value-timestamp triplets from the table and from the other tables, wherein older key-value-timestamp triplets having a same key as a newer key-value-timestamp triplet are not included in the new table; and storing on disk of the first computer node the new table; wherein each computer node in the plurality of computer nodes in the data storage platform comprises one or more hardware processors. 2. A method as recited in claim 1 further comprising: generating the metadata, by a controller virtual machine that causes the data to be written to a second computer node in the data storage platform, wherein the controller virtual machine executes on a computer server in communication with the data storage platform. 3. A method as recited in claim 1 further comprising: storing the metadata in a second memory block of a second one of the plurality of computer nodes, wherein the first and second memory blocks have a same identifier and wherein the first and second computer nodes use a same scheme for identifying memory blocks. 4. A method as recited in claim 3 further comprising: skipping at least one identifier used in the scheme on the second computer node in order that the first and second memory blocks have the same identifier. 5. A method comprising: by a controller virtual machine that causes a block of data to be written to a second computer node in data storage platform that comprises computer nodes, generating metadata for the block of data, wherein the metadata pertains to the block of data as stored in the data storage platform, and wherein the controller virtual machine executes on a computer server in communication with a data storage platform that comprises computer nodes, including the second computer node; by a first computer node among the computer nodes of the data storage platform, storing the metadata, received from the controller virtual machine, as a first key-value-timestamp triplet in a first memory block of the first computer node, wherein the first memory block has a first identifier; by the first computer node, flushing the first memory block when full, including the first key-value-timestamp triplet, to a table stored as a metadata file on disk of the first computer node, wherein the table comprises a plurality of key-value-timestamp triplets, including the metadata from the first memory block written into the table as the first key-value-timestamp triplet, and wherein the first computer node assigns the first identifier to the metadata file; by the first computer node, calculating fingerprint values for the metadata file and storing the fingerprint values in a fingerprint file in the data storage platform, wherein for each one of a plurality of regions in the metadata file, a corresponding fingerprint value comprises a start-length-hash value triplet, and wherein a hash value in the start-length-hash value triplet is based on contents of a corresponding region in the metadata file; by the first computer node, compacting the table in the metadata file with other tables in other metadata files on disk of the first computer node to produce a new metadata file comprising a new table of key-value-timestamp triplets, wherein older key-value-timestamp triplets having a same key as a newer key-value-timestamp triplet are not included in the new table; and storing the new metadata file to disk of the first computer node; wherein each computer node in the data storage platform comprises one or more hardware processors. 6. The method of claim 5 , wherein each key-value-timestamp triplet in the metadata file comprises information of where a corresponding block of data is stored in the data storage platform and a timestamp of a write request issued by the controller virtual machine. 7. The method of claim 5 , wherein the table in the metadata file is organized as a string-sorted table sorted by key of the plurality of key-value-timestamp triplets therein. 8. The method of claim 5 , wherein the new table in the new metadata file resulting from the compacting is organized as a string-sorted table sorted by key of the key-value-timestamp triplets therein. 9. The method of claim 5 , wherein the first computer node is distinct from the second computer node, which hosts the block of data. 10. The method of claim 5 , wherein the first computer node is the same as the second computer node, and wherein the block of data and the metadata file are stored on disk at the same computer node. 11. The method of claim 5 , wherein a metadata module executing at the first computer node performs the storing of the metadata received from the controller virtual machine, the flushing of the first memory block, the calculating of the fingerprint values, and the compacting. 12. The method of claim 5 further comprising: by a third one of the computer nodes of the data storage platform, storing the metadata received from the controller virtual machine, in a second memory block of the third computer node, wherein the first and third computer nodes use a same scheme for identifying memory blocks, and wherein the second memory block at the third computer node has the first identifier. 13. The method of claim 12 further comprising: skipping at least one identifier used in the scheme on the third computer node in order that the first and second memory blocks have the same identifier. 14. The method of claim 5 , wherein metadata pertaining to the block of data is stored in a first plurality of computer nodes of the data storage platform, including the first computer node, and wherein the block of data is stored in a second plurality of computer nodes of the data storage platform, including the second computer node, and wherein the first plurality and the second plurality differ by at least one computer node. 15. The method of claim 5 , further comprising: by the first computer node, after the compacting, retaining the new metadata file and deleting the metadata file and the other metadata files. 16. The method of claim 5 further comprising: by the first computer node, calculating new fingerprint values for the new metadata file and storing the new fingerprint values in a new fingerprint file in the data storage platform; and by the first computer

Assignees

Commvault Systems Inc

Inventors

Classifications

G06F16/2471
Distributed queries · CPC title
G06F11/1464
for networked environments · CPC title
G06F16/275
Synchronous replication · CPC title
G06F16/182
Distributed file systems · CPC title
G06F16/24573
using data annotations, e.g. user-defined metadata · CPC title

Patent family

Related publications grouped by family.

View patent family 71993860

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468015B2 cover?: A client machine writes to and reads from a virtual disk on a remote storage platform. Metadata is generated and stored in replicas on different metadata nodes of the storage platform. A modified log-structured merge tree is used to store and compact string-sorted tables of metadata. During file storage and compaction, a consistent file identification scheme is used across all metadata nodes. A…
Who is the assignee on this patent?: Commvault Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/178. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).