What technology area does this patent fall under?

Primary CPC classification G06F16/178. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Synchronization of metadata in a distributed storage system

US11455280B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11455280-B2
Application number	US-202016919721-A
Country	US
Kind code	B2
Filing date	Jul 2, 2020
Priority date	Dec 7, 2017
Publication date	Sep 27, 2022
Grant date	Sep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A client machine writes to and reads from a virtual disk on a remote storage platform. Metadata is generated and stored in replicas on different metadata nodes of the storage platform. A modified log-structured merge tree is used to store and compact string-sorted tables of metadata. During file storage and compaction, a consistent file identification scheme is used across all metadata nodes. A fingerprint file is calculated for each SST (metadata) file on disk that includes hash values corresponding to regions of the SST file. To synchronize, the fingerprint files of two SST files are compared, and if any hash values are missing from a fingerprint file then the key-value-timestamp triples corresponding to these missing hash values are sent to the SST file that is missing them. The SST file is compacted with the missing triples to create a new version of the SST file. The synchronization is bi-directional.

First claim

Opening claim text (preview).

We claim: 1. A system comprising: a plurality of computer nodes, wherein each computer node among the plurality of computer nodes comprises one or more data storage drives and is configured to: retrieve from one of the plurality of computer nodes a first fingerprint file that includes a plurality of first hash values, wherein each first hash value among the plurality of first hash values corresponds to a first region of a first metadata file, wherein the first metadata file includes a first plurality of key-value-timestamp triples each of which uniquely identifies a portion of metadata that pertains to a particular block of data that has been stored to a computer node among the plurality of computer nodes, and wherein each first region of the first metadata file comprises at least part of a key-value-timestamp triple among the first plurality of key-value-timestamp triples; based on indicia that a second metadata file is a replica of the first metadata file, bi-directionally synchronize the first metadata file and the second metadata file, wherein synchronizing bi-directionally comprises: retrieve from a computer node among the plurality of computer nodes a second fingerprint file that includes a plurality of second hash values, wherein each second hash value corresponds to a second region of the second metadata file, wherein each second region comprises at least part of a key-value-timestamp triple among a second plurality of key-value-timestamp triples in the second metadata file, and based on determining that first hash values are not present among the plurality of second hash values, identify in the first metadata file one or more key-value-timestamp triples among the first plurality of key-value-timestamp triples that correspond to the first hash values not present among the plurality of second hash values, and update the second metadata file with the one or more key-value-timestamp triples among the first plurality of key-value-timestamp triples that were identified in the first metadata file, wherein the particular block of data has been stored to a first computer node among the plurality of computer nodes, which is associated with the first metadata file, and has also been stored to a second computer node among the plurality of computer nodes, which is distinct from the first computer node and is associated with the second metadata file, and determine whether all second hash values are present among the plurality of first hash values. 2. The system of claim 1 , wherein each computer node among the plurality of computer nodes is further configured to: based on determining that second hash values are not present among the plurality of first hash values, identify in the second metadata file one or more key-value-timestamp triples among the second plurality of key-value-timestamp triples that correspond to the second hash values not present among the plurality of first hash values; and update the first metadata file with the one or more key-value-timestamp triples among the second plurality of key-value-timestamp triples that were identified in the second metadata file. 3. The system of claim 2 , wherein updating of the first metadata file and updating of the second metadata file synchronizes bi-directionally, between distinct computer nodes among the plurality of computer nodes, metadata files corresponding to the portion of metadata that pertains to the particular block of data. 4. The system of claim 1 , wherein the first hash values that are not present among the plurality of second hash values correspond to missing regions of the second metadata file. 5. The system of claim 1 , wherein each computer node among the plurality of computer nodes is further configured to create a new version of the second metadata file by compacting the second metadata file as updated with the one or more key-value-timestamp triples among the first plurality of key-value-timestamp triples. 6. The system of claim 1 , wherein the first metadata file and the second metadata file are located on different computer nodes among the plurality of computer nodes and have a same file identifier, and wherein the indicia that the second metadata file is a replica of the first metadata file is based on the same file identifier. 7. The system of claim 1 , wherein the first metadata file and the second metadata file are stored on disk by respective computer nodes among the plurality of computer nodes that use a same file identification scheme. 8. The system of claim 1 , wherein the first fingerprint file and the second fingerprint file are retrieved from a same computer node. 9. The system of claim 1 , wherein each first hash value is part of a start-length-hash value triple that uniquely identifies a first region of the first metadata file as stored on disk. 10. The system of claim 1 , wherein each of the first metadata file and the second metadata file is organized as a string-sorted-table (SST). 11. A method comprising: retrieving, from a computer node of a data storage platform, a first fingerprint file that includes a plurality of first hash values, wherein each first hash value among the plurality of first hash values corresponds to a first region of a first metadata file, wherein the first metadata file includes a first plurality of key-value-timestamp triples each of which uniquely identifies a portion of metadata that pertains to a particular block of data that has been stored to a computer node of the data storage platform, and wherein each first region of the first metadata file comprises at least part of a key-value-timestamp triple among the first plurality of key-value-timestamp triples; based on indicia that a second metadata file is a replica of the first metadata file, retrieving from a computer node of the data storage platform a second fingerprint file that includes a plurality of second hash values, wherein each second hash value corresponds to a second region of the second metadata file, wherein each second region comprises at least part of a key-value-timestamp triple among a second plurality of key-value-timestamp triples in the second metadata file; based on determining that first hash values are not present among the plurality of second hash values, identifying in the first metadata file one or more key-value-timestamp triples among the first plurality of key-value-timestamp triples that correspond to the first hash values not present among the plurality of second hash values; updating the second metadata file with the one or more key-value-timestamp triples among the first plurality of key-value-timestamp triples identified in the first metadata file; based on determining that second hash values are not present among the plurality of first hash values correspond to missing regions of the first metadata file, identifying in the second metadata file one or more key-value-timestamp triples among the second plurality of key-value-timestamp triples that correspond to the second hash values not present among the plurality of first hash values; and updating the first metadata file with the one or more key-value-timestamp triples among the second plurality of key-value-timestamp triples identified in the second metadata file; and wherein each computer node of the data storage platform comprises one or more data storage drives. 12. The method of claim 11 , wherein the first hash values that are not present among the plurality of second hash values correspond to missing regions of the second metadata file, and wherein the second hash values that are not present among the plurality of first hash values correspond to missing regions of the first metadata file.

Assignees

Commvault Systems Inc

Inventors

Classifications

G06F16/178Primary
Techniques for file synchronisation in file systems · CPC title
G06F16/24573
using data annotations, e.g. user-defined metadata · CPC title
G06F16/2471
Distributed queries · CPC title
G06F16/182
Distributed file systems · CPC title
G06F16/275
Synchronous replication · CPC title

Patent family

Related publications grouped by family.

View patent family 71993860

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455280B2 cover?: A client machine writes to and reads from a virtual disk on a remote storage platform. Metadata is generated and stored in replicas on different metadata nodes of the storage platform. A modified log-structured merge tree is used to store and compact string-sorted tables of metadata. During file storage and compaction, a consistent file identification scheme is used across all metadata nodes. A…
Who is the assignee on this patent?: Commvault Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/178. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).