Efficient deduplication database validation
US-9639274-B2 · May 2, 2017 · US
US11468015B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11468015-B2 |
| Application number | US-202016919712-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 2, 2020 |
| Priority date | Dec 7, 2017 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A client machine writes to and reads from a virtual disk on a remote storage platform. Metadata is generated and stored in replicas on different metadata nodes of the storage platform. A modified log-structured merge tree is used to store and compact string-sorted tables of metadata. During file storage and compaction, a consistent file identification scheme is used across all metadata nodes. A fingerprint file is calculated for each SST (metadata) file on disk that includes hash values corresponding to regions of the SST file. To synchronize, the fingerprint files of two SST files are compared, and if any hash values are missing from a fingerprint file then the key-value-timestamp triplets corresponding to these missing hash values are sent to the SST file that is missing them. The SST file is compacted with the missing triplets to create a new version of the SST file. The synchronization is bi-directional.
Opening claim text (preview).
We claim: 1. A method comprising: storing metadata in a first memory block of a first computer node among a plurality of computer nodes in a data storage platform, wherein the metadata comprises information about a mutation of data that occurred in the data storage platform, wherein the mutation comprises a writing of the data into the data storage platform, and wherein the metadata comprises one or more of: whether the data was written successfully or failed to be written into a computer node of the data storage platform, and a name of a virtual disk within the data storage platform where the data was written; flushing the first memory block, including the metadata stored therein, to a table that is stored on disk of the first computer node, wherein the metadata from the first memory block is written into the table as a first key-value-timestamp triplet, wherein the table is organized as a string-sorted table that comprises key-value-timestamp triplets, including the first key-value-timestamp triplet, and wherein the table is sorted by key of the key-value-timestamp triplets therein; calculating a fingerprint value for the table and storing the fingerprint value in the data storage platform; compacting, on disk of the first computer node, the table with other tables comprising metadata to produce a new table that comprises key-value-timestamp triplets from the table and from the other tables, wherein older key-value-timestamp triplets having a same key as a newer key-value-timestamp triplet are not included in the new table; and storing on disk of the first computer node the new table; wherein each computer node in the plurality of computer nodes in the data storage platform comprises one or more hardware processors. 2. A method as recited in claim 1 further comprising: generating the metadata, by a controller virtual machine that causes the data to be written to a second computer node in the data storage platform, wherein the controller virtual machine executes on a computer server in communication with the data storage platform. 3. A method as recited in claim 1 further comprising: storing the metadata in a second memory block of a second one of the plurality of computer nodes, wherein the first and second memory blocks have a same identifier and wherein the first and second computer nodes use a same scheme for identifying memory blocks. 4. A method as recited in claim 3 further comprising: skipping at least one identifier used in the scheme on the second computer node in order that the first and second memory blocks have the same identifier. 5. A method comprising: by a controller virtual machine that causes a block of data to be written to a second computer node in data storage platform that comprises computer nodes, generating metadata for the block of data, wherein the metadata pertains to the block of data as stored in the data storage platform, and wherein the controller virtual machine executes on a computer server in communication with a data storage platform that comprises computer nodes, including the second computer node; by a first computer node among the computer nodes of the data storage platform, storing the metadata, received from the controller virtual machine, as a first key-value-timestamp triplet in a first memory block of the first computer node, wherein the first memory block has a first identifier; by the first computer node, flushing the first memory block when full, including the first key-value-timestamp triplet, to a table stored as a metadata file on disk of the first computer node, wherein the table comprises a plurality of key-value-timestamp triplets, including the metadata from the first memory block written into the table as the first key-value-timestamp triplet, and wherein the first computer node assigns the first identifier to the metadata file; by the first computer node, calculating fingerprint values for the metadata file and storing the fingerprint values in a fingerprint file in the data storage platform, wherein for each one of a plurality of regions in the metadata file, a corresponding fingerprint value comprises a start-length-hash value triplet, and wherein a hash value in the start-length-hash value triplet is based on contents of a corresponding region in the metadata file; by the first computer node, compacting the table in the metadata file with other tables in other metadata files on disk of the first computer node to produce a new metadata file comprising a new table of key-value-timestamp triplets, wherein older key-value-timestamp triplets having a same key as a newer key-value-timestamp triplet are not included in the new table; and storing the new metadata file to disk of the first computer node; wherein each computer node in the data storage platform comprises one or more hardware processors. 6. The method of claim 5 , wherein each key-value-timestamp triplet in the metadata file comprises information of where a corresponding block of data is stored in the data storage platform and a timestamp of a write request issued by the controller virtual machine. 7. The method of claim 5 , wherein the table in the metadata file is organized as a string-sorted table sorted by key of the plurality of key-value-timestamp triplets therein. 8. The method of claim 5 , wherein the new table in the new metadata file resulting from the compacting is organized as a string-sorted table sorted by key of the key-value-timestamp triplets therein. 9. The method of claim 5 , wherein the first computer node is distinct from the second computer node, which hosts the block of data. 10. The method of claim 5 , wherein the first computer node is the same as the second computer node, and wherein the block of data and the metadata file are stored on disk at the same computer node. 11. The method of claim 5 , wherein a metadata module executing at the first computer node performs the storing of the metadata received from the controller virtual machine, the flushing of the first memory block, the calculating of the fingerprint values, and the compacting. 12. The method of claim 5 further comprising: by a third one of the computer nodes of the data storage platform, storing the metadata received from the controller virtual machine, in a second memory block of the third computer node, wherein the first and third computer nodes use a same scheme for identifying memory blocks, and wherein the second memory block at the third computer node has the first identifier. 13. The method of claim 12 further comprising: skipping at least one identifier used in the scheme on the third computer node in order that the first and second memory blocks have the same identifier. 14. The method of claim 5 , wherein metadata pertaining to the block of data is stored in a first plurality of computer nodes of the data storage platform, including the first computer node, and wherein the block of data is stored in a second plurality of computer nodes of the data storage platform, including the second computer node, and wherein the first plurality and the second plurality differ by at least one computer node. 15. The method of claim 5 , further comprising: by the first computer node, after the compacting, retaining the new metadata file and deleting the metadata file and the other metadata files. 16. The method of claim 5 further comprising: by the first computer node, calculating new fingerprint values for the new metadata file and storing the new fingerprint values in a new fingerprint file in the data storage platform; and by the first computer
Distributed queries · CPC title
for networked environments · CPC title
Synchronous replication · CPC title
Distributed file systems · CPC title
using data annotations, e.g. user-defined metadata · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.