Shuffling file digests stored in data stores of a distributed file system

US10956375B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10956375-B2
Application numberUS-201816033603-A
CountryUS
Kind codeB2
Filing dateJul 12, 2018
Priority dateJul 12, 2018
Publication dateMar 23, 2021
Grant dateMar 23, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving, at a content provisioning system from one or more client devices, one or more requests for file digests stored in respective data stores of a plurality of data stores in a distributed file system. The file digests are distributed across different ones of the plurality of data stores in the distributed file system. The method also includes determining a location of a given one of the requested file digests in one or more of the plurality of data stores and retrieving the given file digest from the determined location. The method further includes shuffling the distribution of the file digests across the plurality of data stores in the distributed file system.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, at a content provisioning system from one or more client devices, one or more requests for file digests stored in respective data stores of a plurality of data stores in a distributed file system, the file digests being distributed across different ones of the plurality of data stores in the distributed file system, each file digest of the file digests being associated with a corresponding file stored in the distributed file system, wherein the file digests comprise hash values of respective files stored in the distributed file system, and wherein the hash values provide identifiers for locating the respective files stored in the distributed file system; determining, by at least one processing device comprising a processor coupled to a memory, a physical storage location of a given file digest of the requested file digests in a first data store of the plurality of data stores; retrieving, by the at least one processing device, the given file digest from the determined physical storage location in the first data store; and shuffling, by the at least one processing device, the distribution of the file digests across the plurality of data stores in the distributed file system, wherein the shuffling the distribution of the file digests across the plurality of data stores comprises: moving, by the at least one processing device, the given file digest from the determined physical storage location in the first data store to a new physical storage location in a second data store of the plurality of data stores and maintaining a given file associated with the given file digest in the given file's current physical storage location in the distributed file system; and moving, by the at least one processing device, at least one additional file digest not associated with any of the one or more requests for file digests from a previous physical storage location in a third data store of the plurality of data stores to a new physical storage location on one of the first and second data stores and maintaining at least one additional file associated with the at least one additional file digest in the at least one additional file's current physical storage location in the distributed file system; wherein the shuffling the distribution of the file digests across the plurality of data stores further comprises utilizing a set of swap operations that moves at least a subset of the file digests between two or more of a plurality of nodes in at least one of two or more levels of a tree structure comprising the plurality of nodes. 2. The method of claim 1 wherein the hash values provide unique and uniformly-sized identifiers for locating files stored in the distributed file system. 3. The method of claim 1 wherein the file digests are distributed across the plurality of data stores in the distributed file system by utilizing the tree structure, and wherein the file digests are stored in leaves of the tree structure. 4. The method of claim 3 wherein the tree structure comprises a set of leaf nodes without links between the leaf nodes. 5. The method of claim 3 wherein internal nodes and the leaves of the tree structure are distributed among the data stores in the distributed file system. 6. The method of claim 1 wherein a given node of the tree structure comprises a logical identifier, the logical identifier comprising: a first portion identifying a given one of the plurality of data stores; a second portion identifying a level of the tree structure; and a third portion indicating a physical storage location in the given data store. 7. The method of claim 1 wherein shuffling the distribution of the file digests across the plurality of data stores is performed responsive to each request of the one or more requests received at the content provisioning system. 8. The method of claim 1 wherein shuffling the distribution of the file digests across the plurality of data stores comprises re-distributing physical storage locations of at least a portion of the file digests pseudo-randomly across the plurality of data stores in the distributed file system. 9. The method of claim 1 wherein the plurality of data stores are implemented on a plurality of cloud storage nodes. 10. The method of claim 1 wherein the plurality of data stores provide a distributed hash table architecture. 11. The method of claim 1 wherein the plurality of data stores comprises at least three data stores. 12. A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to receive, at a content provisioning system from one or more client devices, one or more requests for file digests stored in respective data stores of a plurality of data stores in a distributed file system, the file digests being distributed across different ones of the plurality of data stores in the distributed file system, each file digest of the file digests being associated with a corresponding file stored in the distributed file system, wherein the file digests comprise hash values of respective files stored in the distributed file system, and wherein the hash values provide identifiers for locating the respective files stored in the distributed file system; to determine a physical storage location of a given file digest of the requested file digests in a first data store of the plurality of data stores; to retrieve the given file digest from the determined physical storage location in the first data store; and to shuffle the distribution of the file digests across the plurality of data stores in the distributed file system, wherein the shuffling the distribution of the file digests across the plurality of data stores comprises: moving the given file digest from the determined physical storage location in the first data store to a new physical storage location in a second data store of the plurality of data stores and maintaining a given file associated with the given file digest in the given file's current physical storage location in the distributed file system; and moving at least one additional file digest not associated with any of the one or more requests for file digests from a previous physical storage location in a third data store of the plurality of data stores to a new physical storage location on one of the first and second data stores and maintaining at least one additional file associated with the at least one additional file digest in the at least one additional file's current physical storage location in the distributed file system; and wherein the shuffling the distribution of the file digests across the plurality of data stores further comprises utilizing a set of swap operations that moves at least a subset of the file digests between two or more of a plurality of nodes in at least one of two or more levels of a tree structure comprising the plurality of nodes. 13. The computer program product of claim 12 wherein the hash values provide unique and uniformly-sized identifiers for locating files stored in the distributed file system. 14. The computer program product of claim 12 wherein the file digests are distributed across the plurality of data stores in the distributed file system by utilizing the tree structure, and wherein the file digests are stored in leaves of the tree structure. 15. An apparatus comprising: at least one processing device comprising a processor coupled to a memory;

Assignees

Inventors

Classifications

  • G06F16/134Primary

    Distributed indices · CPC title

  • Hash-based (content-based indexing of textual data G06F16/31) · CPC title

  • Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof (details of archiving G06F16/11) · CPC title

  • Details of migration of file systems (migration mechanisms in storage systems G06F3/0647) · CPC title

  • G06F16/182Primary

    Distributed file systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10956375B2 cover?
A method includes receiving, at a content provisioning system from one or more client devices, one or more requests for file digests stored in respective data stores of a plurality of data stores in a distributed file system. The file digests are distributed across different ones of the plurality of data stores in the distributed file system. The method also includes determining a location of a…
Who is the assignee on this patent?
Emc Ip Holding Co Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/134. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 23 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).