Coherent distributed logging
US-10158642-B2 · Dec 18, 2018 · US
US11741050B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11741050-B2 |
| Application number | US-202117162501-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 29, 2021 |
| Priority date | Jan 29, 2021 |
| Publication date | Aug 29, 2023 |
| Grant date | Aug 29, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed relating to managing distributed storage of data for various entities according to classifications for these entities. A database node of a distributed storage system may receive, from a first entity of a plurality of entities, a request to store a set of data. The database node may further obtain metadata associated with the first entity, wherein the metadata specifies one of a plurality of classifications for the entities. The database node may provide the set of data to one or more of a plurality of caches for storage. The caches may be located in two or more availability zones and are configured to store the set of data based on the classification for the first entity identified in the metadata associated with the first entity. The database node may also store the set of data in a shared object storage coupled to the database node.
Opening claim text (preview).
What is claimed is: 1. A method for managing distributed data storage using a plurality of caches and a shared object storage, comprising: receiving, by a database node of a distributed storage system from a first entity of a plurality of entities, a request to store a set of data; obtaining, by the database node, metadata associated with the first entity, wherein the metadata specifies one of a plurality of classifications for the plurality of entities, and wherein the distributed storage system manages distributed storage of data for the plurality of entities; providing, by the database node, the set of data to one or more of the plurality of caches for storage, wherein the plurality of caches are configured to store the set of data based on a classification for the first entity identified in the metadata associated with the first entity, and wherein the plurality of caches are located in two or more availability zones that are geographically separated and across which data is replicated; and storing, by the database node in the shared object storage coupled to the database node, the set of data, wherein the shared object storage provides a higher bandwidth than the plurality of caches. 2. The method of claim 1 , wherein the obtaining is performed by: receiving, from a cluster manager of the distributed storage system, metadata associated with different ones of the plurality of entities, wherein the metadata includes tags maintained by the cluster manager for different sets of data based on classifications for entities associated with the different sets of data. 3. The method of claim 2 , wherein a tag for the set of data indicates respective availability zones of a number of caches storing copies of the set of data. 4. The method of claim 1 , wherein the plurality of caches are further configured to allocate a larger amount of cache space for the first entity than for a second entity of the plurality of entities based on a classification for the first entity indicated in the metadata associated with the first entity and a classification for the second entity indicated in metadata associated with the second entity. 5. The method of claim 1 , wherein the plurality of caches are further configured to: in response to a system failure: determine, based on the classification for the first entity, to repopulate data for the first entity into one or more of the plurality of caches from the shared object storage; and determine, based on a classification for a second entity, to not repopulate data for the second entity of the plurality of entities into one or more of the plurality of caches, wherein repopulating data for the first entity is performed without being provoked by a cache miss. 6. The method of claim 1 , wherein the plurality of caches are further configured to: store, based on the classification for the first entity, multiple copies of data for the first entity across different availability zones; and store, based on a classification for a second entity of the plurality of entities, a single copy of data for the second entity. 7. The method of claim 1 , wherein the plurality of caches are further configured to: perform, based on a classification for a second entity, cache evictions of data stored for the second entity prior to performing cache evictions of data stored for the first entity. 8. The method of claim 1 , further comprising: receiving, by the database node from a second entity of the plurality of entities, a request for a second set of data; identifying, by the database node based on metadata associated with the second entity, a first cache of the plurality of caches storing the second set of data, wherein the first cache is located in a first availability zone; and responding, by the database node, to the request for the second set of data, wherein the responding is performed based on retrieving the second set of data from the first cache. 9. The method of claim 8 , further comprising: determining, by the database node, that a cache miss has occurred in the first cache; and determining, by the database node based on a classification of an entity associated with the cache miss, whether to service a query for data missing from the first cache using data from the shared object storage. 10. The method of claim 1 , further comprising: receiving, by the database node from a second entity of the plurality of entities, a request for a second set of data; and retrieving, by the database node from the shared object storage, the second set of data, wherein the retrieving is performed based on a classification indicated in metadata associated with the second entity. 11. A distributed storage system, comprising: at least one processor; a data cluster comprising a plurality of storage caches separated into a plurality of availability zones; a shared object storage coupled to the data cluster; a plurality of database nodes located in the plurality of availability zones that are geographically distributed and across which data is replicated; and wherein a first database node in a first availability zone is executable by the at least one processor to cause the distributed storage system to: receive a request for a first set of data from a first entity of a plurality of entities for which the distributed storage system managed distributed storage of data; obtain metadata associated with the first entity, wherein the metadata specifies one of a plurality of classifications for the plurality of entities; communicate with a first cache of the plurality of storage caches for retrieving the first set of data, wherein the first cache is located in the first availability zone, and wherein the plurality of storage caches are configured to store data for the plurality of entities based on the plurality of classifications; identify, based on the communication, that a cache miss has occurred; determine, based on the cache miss and a classification for the first entity specified in the metadata associated with the first entity, whether to respond to the request for the first set of data using the shared object storage or a second cache located in a second availability zone, wherein the plurality of storage caches provide lower latency data retrieval than the shared object storage; and respond, based on determining to use the second cache located in the second availability zone, to the request for the first set of data. 12. The distributed storage system of claim 11 , wherein the metadata associated with the first entity is obtained from a cluster manager of the distributed storage system, wherein metadata maintained by the cluster manager for different ones of the plurality of entities indicates respective availability zones in which data for different ones of the plurality of entities are stored. 13. The distributed storage system of claim 11 , wherein the first database node is further configured to: receive, from the first entity, a request to store a second set of data; provide the second set of data to one or more of a plurality of caches for storage; and store the second set of data in the shared object storage. 14. The distributed storage system of claim 13 , wherein providing the second set of data to the one or more of the plurality of caches for storage includes providing instructions specifying to store multiple copies of the second set of data in multiple caches located across different availability zones, wherein the instructions are provided based on the classification for the first entity. 15. The distributed storage system of claim 11 , wherein the plurality of storage
Caching, prefetching or hoarding of files · CPC title
File search processing · CPC title
Query results presentation · CPC title
File meta data generation · CPC title
implemented using Network-attached Storage [NAS] architecture (distributed or networked storage systems G06F3/067; protocols for distributed storage of data in a network H04L67/1097) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.