Placement policy

US9268808B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9268808-B2
Application numberUS-201213731722-A
CountryUS
Kind codeB2
Filing dateDec 31, 2012
Priority dateDec 31, 2012
Publication dateFeb 23, 2016
Grant dateFeb 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A region-based placement policy that can be used to achieve a better distribution of data in a clustered storage system is disclosed herein. The clustered storage system includes a master module to implement the region-based placement policy for storing one or more copies of a received data across many data nodes of the clustered storage system. When implementing the region-based placement policy, the master module splits the received data into one or more regions, where each region includes a contiguous portion of the received data. Further, for each of the plurality of regions, the master module stores complete copies of the region in a subset of the data nodes.

First claim

Opening claim text (preview).

What is claimed is: 1. A clustered storage system comprising: a plurality of data nodes for receiving and storing data associated with the clustered storage system, wherein: at least some of the plurality of data nodes comprise a memory and one or more processors; the data nodes are arranged in multiple racks such that at least one of the racks includes at least two data nodes; and the data is received from one or more client systems serviced by the clustered storage system; and a master module that is in communication with one or more of the plurality of data nodes and that facilitates storage of data in the plurality of data nodes, wherein the master module is configured, when executed by one or more processors, to: receive client data from a client system of the one or more client systems, wherein the client data comprises a data table including a plurality of rows and columns; split the client data into a plurality of regions, each region including a contiguous set of the rows of the data table; and for at least one selected region of the plurality of regions, divide the selected region into two or more data files such that each data item in the selected region with a common first column identifier is in a first of the two or more data files and each data item in the selected region with a common second column identifier is in a second of the two or more data files; create a first replica and a second replica of the selected region; select a primary rack with a primary data node; store the selected region, including the two or more data files, in the primary data node of the primary rack; select a secondary rack, different from the primary rack, with at least a secondary data node and a tertiary data node different from the secondary data node; store the first replica of the selected region, including first replicas of the data files, in the secondary data node of the secondary rack; and store the second replica of the selected region, including second replicas of the data files, in the tertiary data node of the secondary rack, wherein the clustered storage system uses the primary rack to respond to at least one data request before and/or during a data request handled by the secondary rack. 2. The clustered storage system of claim 1 , wherein selecting the primary rack and selecting the secondary rack are each directed according to a placement policy maintained in association with the master module. 3. The clustered storage system of claim 1 , wherein storing the first replica includes: processing, by the master module, a plurality of write requests, each write request corresponding to one or more of the data files associated with the selected region. 4. The clustered storage system of claim 1 , wherein the data associated with at least one data file of the one or more data files is stored as one or more data blocks in an associated data node. 5. The clustered storage system of claim 1 , wherein the plurality of data nodes constitute at least a portion of a Hadoop clustered storage system. 6. The clustered storage system of claim 1 , wherein the master module constitutes at least a portion of a HBase database layer. 7. The clustered storage system of claim 1 , wherein: dividing the selected region into two or more data files comprises dividing at least one selected row of the selected region such that a first portion of the selected row is in the first of the two or more data files and a second portion of the selected row is in the second of the two or more data files. 8. A computer-implemented method for storing data in a clustered storage system, the clustered storage system including a plurality of storage nodes, arranged in multiple racks such that at least one of the racks includes at least two data nodes, the plurality of storage nodes operable to store the data associated with the clustered storage system, the method comprising: receiving, by a master module associated with the clustered storage system, client data to be stored in the clustered storage system, the client data received from one or more client systems serviced by the clustered storage system and comprising a data table including a plurality of rows and columns; splitting, by the master module, the client data into a plurality of regions, at least one region including a contiguous set of the rows of the data table; and for at least one selected region of the plurality of regions, dividing the selected region into two or more data files such that each data item in the selected region with a common first column identifier is in a first of the two or more data files and each data item in the selected region with a common second column identifier is in a second of the two or more data files; creating a first replica and a second replica of the selected region; selecting a primary rack with a primary data node; storing the selected region, including the two or more data files, in the primary data node of the primary rack; selecting a secondary rack, different from the primary rack, with at least a secondary data node and a tertiary data node different from the secondary data node; storing the first replica of the selected region, including first replicas of the data files, in the secondary data node of the secondary rack; and storing the second replica of the selected region, including second replicas of the data files, in the tertiary data node of the secondary rack, wherein the clustered storage system uses the primary rack to respond to at least one data request before and/or during a data request handled by the secondary rack. 9. The method of claim 8 , wherein the storing of each replica includes processing a plurality of sub-write requests, each sub-write request associated with storage corresponding to the one or more of the data files. 10. The method of claim 8 , wherein the data associated with a data file is stored as one or more data blocks in a storage node. 11. A computer-implemented method for storing data in a clustered storage system, the method comprising: receiving, by a storage server associated with the clustered storage system, data to be stored in the clustered storage system, the clustered storage system including a plurality of storage nodes for storing the data, wherein the storage nodes are arranged in multiple racks; splitting, by the storage server, the data into one or more regions, each region constituting a contiguous portion of the received data; assigning, by the storage server, each region to one of a plurality of region servers, wherein each region server manages data access on behalf of the region; for at least one selected region of the one or more regions, dividing, based on data columns, the selected region into two or more data files by dividing at least one selected row of the selected region such that a first portion of the selected row is in a first of the two or more data files and a second portion of the selected row is in a second of the two or more data files; and determining, by the storage server, multiple storage nodes of the plurality of storage nodes to store the selected region in; wherein: at least a first one of the multiple storage nodes for the selected region is located in a first of the multiple racks, at least a second one of the multiple storage nodes, other than the first one of the multiple storage nodes, and a third one of the multiple storage nodes, other than the first and second ones of the multiple storage nodes, for the selected region are located in a second of the multiple racks other than the first of the multiple racks, and the clustered storage system uses the first of the multiple racks to respond to at

Assignees

Inventors

Classifications

  • Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof (details of archiving G06F16/11) · CPC title

  • G06F16/134Primary

    Distributed indices · CPC title

  • Provision of network file services by network file servers, e.g. by using NFS, CIFS (network file access protocols H04L67/1097) · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • based on file chunks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9268808B2 cover?
A region-based placement policy that can be used to achieve a better distribution of data in a clustered storage system is disclosed herein. The clustered storage system includes a master module to implement the region-based placement policy for storing one or more copies of a received data across many data nodes of the clustered storage system. When implementing the region-based placement poli…
Who is the assignee on this patent?
Facebook Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/134. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).