Halo based file system replication

US9807164B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9807164-B2
Application numberUS-201414341547-A
CountryUS
Kind codeB2
Filing dateJul 25, 2014
Priority dateJul 25, 2014
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosure is directed to replicating datasets between data storage servers in a distributed computer network synchronously and asynchronously (“the technology”). A replication interface receives a request from a client to store a dataset in the distributed computer network. The replication interface identifies a first set of storage servers that are within a halo defined by the client. The replication interface replicates the dataset to the first set of the storage servers synchronously, and a remaining set of the storage servers, e.g., storage servers that are outside of the halo asynchronously. The replication interface can perform the synchronous and asynchronous replication simultaneously. The halo can be determined based on various parameters, including a halo latency, which indicates a permissible latency threshold between the client and a storage server to which the dataset is to be replicated synchronously.

First claim

Opening claim text (preview).

I claim: 1. A method performed by a computing system, comprising: receiving, at a client computing system in a distributed computer network having a plurality of data storage servers, a request to store a dataset in the distributed computer network, the data storage servers configured to be read-write storage servers; identifying, by the client computing system and based on a halo latency parameter specified in a storage policy, a first set of the data storage servers that are within a halo group defined by the halo latency parameter and a second set of the data storage servers that are outside of the halo group, the halo latency parameter indicating a permissible threshold of a latency between the client computing system and a data storage server of the data storage servers to which the dataset is synchronously replicated, the latency being a time period elapsed between a dispatch of a request from the client computing system to the data storage server and a receipt of the response from the data storage server by the client computing system; and replicating the dataset to: the first set of the data storage servers synchronously, and the second set of the data storage servers asynchronously, the dataset concurrently replicated to the first set and the second set. 2. The method of claim 1 , wherein identifying the first set of the storage servers that are within the halo group includes identifying the first set of the storage servers whose corresponding latencies do not exceed the permissible threshold. 3. The method of claim 1 , wherein identifying the first set of the storage servers within the halo group includes identifying a first group of the storage servers whose corresponding latencies do not exceed the permissible threshold as being in an “online” state and a second group of the storage servers whose corresponding latencies exceed the permissible threshold as being in an “stand-by” state. 4. The method of claim 3 , wherein replicating the dataset to the first set of the storage servers includes replicating the dataset to the first group of the storage servers that are in the “online” state. 5. The method of claim 1 , wherein replicating the dataset to the second set of the storage servers includes replicating, by a daemon program, the dataset from the first set of the storage servers to the second set of the storage servers. 6. The method of claim 5 , wherein the daemon program is configured to identify a first group of the storage servers that are in a second halo group as the second set of the storage servers, the second halo group including the first group of the storage servers having an infinite latency between a specified storage server of the first set of the storage servers on which the daemon program is executing and the first group of the storage servers. 7. The method of claim 1 , wherein identifying the first set of the storage servers within the halo group further includes: determining a number of replicas of the dataset to be stored in the distributed computer network, determining whether the number of replicas exceed a maximum number of replicas to be stored at the first set of the storage servers, responsive to a determination the number of replicas exceed the maximum number of replicas of the first set of the storage servers, identifying a subset of the second set of the storage servers to store a remaining number of replicas that exceed the maximum number of replicas. 8. The method of claim 7 further comprising: replicating the remaining number of replicas of the dataset to the subset of the second set of the storage servers. 9. The method of claim 1 , wherein the client is one of a plurality of clients and at least some of the clients have halo parameters with different latency thresholds. 10. The method of claim 1 , wherein the distributed computer network is a GlusterFS distributed storage file system. 11. A system, comprising: a processor; a first module configured to store a storage policy used in storing a plurality of datasets at a plurality of storage servers in a distributed computer network, wherein at least some of the storage servers are configured to store a replica of a dataset of the datasets stored in another storage server of the storage servers, the storage policy including a halo parameter, the halo parameter including a set of tags associated with the storage servers, each of the set of tags describing a first attribute associated with the storage servers; a second module that is configured to work in cooperation with the processor to receive a request from a client in the distributed computer network to store a first dataset of the datasets in the distributed computer network; a third module that is configured to work in cooperation with the processor to identify a first set of the storage servers having one or more tags that match with the set of tags in the halo parameter as “online” storage servers and a second set of the storage servers whose one or more tags do not match with the set of tags in the halo parameter as “stand-by” storage servers; and a fourth module that is configured to work in cooperation with the processor to synchronously replicate the first dataset to the “online” storage servers and asynchronously replicate the first dataset to the “stand-by” storage servers, the first dataset concurrently replicated to the “online” storage servers and the “stand-by” storage servers. 12. The system of claim 11 , wherein one of the set of tags in the halo parameter is a latency tag, the latency tag describing a permissible latency threshold between the client and a storage server of the storage servers for replicating the first dataset to the storage server synchronously. 13. The system of claim 12 , wherein the third module is configured to identify that the first set of the storage servers match with the halo parameter if latencies of each of the first set of the storage servers with the client do not exceed the permissible latency threshold. 14. The system of claim 11 , wherein the storage servers are configured to store the datasets in a plurality of bricks, wherein a brick of the bricks is a smallest storage unit of a storage server of the servers, and wherein a group of the bricks from the storage servers form a volume. 15. The system of claim 14 , wherein the client is configured to access a subset of the datasets stored at the group of the bricks by mounting the volume on the client, wherein the group of the bricks in the volume contain replicas of the subset of the datasets. 16. The system of claim 14 , wherein the third module is further configured to identify the “online” storage servers and the “stand-by” storage servers by identifying a first subset of the group of the bricks in the volume corresponding to the “online” storage servers as “online” bricks and a second subset of the group of the bricks in the volume corresponding to the “stand-by” storage servers as “stand-by” bricks. 17. A non-transitory computer-readable storage medium storing computer-readable instructions, comprising: instructions for receiving, at a first client in a distributed computer network having a plurality of storage servers, a request to store a first dataset in the distributed computer network, the storage servers configured to be read-write storage servers; instructions for identifying, by the first client and based on a storage policy, a first set of the storage servers to which the first dataset is to be replicated synchronously, the identifying based on a halo latency parameter of the storage policy and a number of repli

Assignees

Inventors

Classifications

  • based on client or server locations · CPC title

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9807164B2 cover?
The disclosure is directed to replicating datasets between data storage servers in a distributed computer network synchronously and asynchronously (“the technology”). A replication interface receives a request from a client to store a dataset in the distributed computer network. The replication interface identifies a first set of storage servers that are within a halo defined by the client. The…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification H04L67/1095. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).