Using append only log format in data storage cluster with distributed zones for determining parity of reliability groups
US-8972478-B1 · Mar 3, 2015 · US
US9740403B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9740403-B2 |
| Application number | US-201514636055-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 2, 2015 |
| Priority date | May 23, 2012 |
| Publication date | Aug 22, 2017 |
| Grant date | Aug 22, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for a data storage cluster and a method for maintaining and updating reliability data and reducing data communication between nodes, are disclosed herein. Each data object is written to a single data zone on a data node within the data storage cluster. Each data object includes one or more data chunks, and the data chunks of a data object are written to a data node in an append-only log format. When parity is determined for a reliability group including the data zone, there is no need to transmit data from other data nodes where the rest of data zones of the reliability group reside. Thus, inter-node data communication for determining reliability data is reduced.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: receiving, at a first node of a plurality of nodes within a data storage cluster, a request for storing a data object including one or more data chunks, wherein a signature of each of the one or more data chunks is determined and is sent to a metadata server of the data storage cluster; writing, by the first node, the received one or more data chunks to a data zone in an append-only log format upon determining the data zone to write the received one or more data chunks, wherein the data zone is assigned to a reliability group defined across more than one of the plurality of nodes within the data storage cluster; sending, by the first node, the written one or more data chunks of the data object to a second node of the plurality of nodes within the data storage cluster, wherein the second node includes a parity zone assigned to the reliability group to which the data zone of the first node is assigned; determining parity chunks for the reliability group at the second node based on the sent one or more data chunks wherein the determining of the parity values does not require use of information from nodes other than the first and second nodes; and writing, by the first node, the determined parity chunks to a parity zone of the second node in the append-only log format. 2. The method of claim 1 , wherein each parity chunk of the parity chunks is written to the parity zone of the second node at an offset at which a corresponding data chunk of the data chunks is written to the data zone of the first node. 3. The method of claim 1 , wherein the signature of the data chunks are determined by a hash function. 4. The method of claim 1 , further comprising: deduplicating, by the first node, one or more data chunks when a metadata server matches the signature of the deduplicated data chunks with one or more entries in a global chunk map; transmitting to the first node the locations of the deduplicated data chunks according to the global chunk map; and instructing, by the first node, to write the data chunks other than the deduplicated data chunks to the data zone of the first node. 5. The method of claim 4 , further comprising: recording by first node, the locations of the deduplicated data chunks to an object record of the data object in the first node, wherein the object record is an inode. 6. The method of claim 1 , further comprising: determining, by the first node, to clean a portion of the data zone of the first node, when the portion is no longer allocated to any data objects stored in the data storage cluster; sending, by the first node, data in the portion of the data zone of the first node to the second node; cleaning, by the first node, the portion of the data zone by marking the portion with a predetermined value; and updating, by the first node, a corresponding portion of the parity zone of the second node, by combining the data in the portion of the data zone from data of the corresponding portion of the parity zone. 7. The method of claim 1 , wherein the data chunks are written to the data zone of the first node in an append-only log format so that the data zone is being written in an increasing order. 8. The method of claim 1 , wherein a second data zone of a third node of the plurality of nodes within the data storage cluster is assigned to the reliability group, along with the data zone of the first node and the parity zone of the second node, and wherein the parity chunks determination on the second node after receiving data chunks from the first node does not require use of data from the second data zone of the third node assigned to the reliability group. 9. The method of claim 1 , further comprising: receiving, by first node, a request for storing a second data object including one or more data chunks; writing, by the first node, the data chunks of the second data object to a second data zone of a third node in an append-only log format, wherein the second data zone is assigned to the reliability group to which the data zone of the first node and the parity zone of the second node are assigned; sending, by the first node, the data chunks of the second data object to the second node of the plurality of nodes within the data storage cluster; and updating, by the first node, parity values for the reliability group at the second node based on the data chunks of the second data object received by the second node, wherein the updating of the parity values does not require use of information from nodes other than the second and third nodes. 10. A non-transitory computer readable medium having stored thereon instructions for managing storage comprising executable code which when executed by one or more processors, causes the processors to perform steps comprising: receiving a request for storing a data object including one or more data chunks, wherein a signature of each of the one or more data chunks is determined and is sent to a metadata server of the data storage cluster; writing the received one or more data chunks to a data zone of a first node in an append-only log format upon determining the data zone to write the received one or more data chunks, wherein the data zone is assigned to a reliability group defined across more than one of the plurality of nodes within the data storage cluster; sending the written one or more data chunks of the data object to a second node of the plurality of data nodes within the data storage cluster, wherein the second node includes a parity zone assigned to the reliability group to which the data zone of the first node is assigned; determining parity chunks for the reliability group at the second node based on the sent one or more data chunks wherein the determining of the parity values does not require use of information from nodes other than the first and second nodes; and writing the determined parity chunks to a parity zone of the second node in the append-only log format. 11. The medium as set forth in claim 10 wherein each parity chunk of the parity chunks is written to the parity zone of the second node at an offset at which a corresponding data chunk of the data chunks is written to the data zone of the first node. 12. The medium as set forth in claim 10 further comprising: wherein the signature of the data chunks are determined by a hash function; deduplicating one or more data chunks when a metadata server matches the signature of the deduplicated data chunks with one or more entries in a global chunk map; transmitting to the first node the locations of the deduplicated data chunks according to the global chunk map; and instructing to write the data chunks other than the deduplicated data chunks to the data zone of the first node. 13. The medium as set forth in claim 10 wherein: the data chunks are written to the data zone of the first node in an append-only log format so that the data zone is being written in an increasing order; and wherein a second data zone of a third node of the plurality of nodes within the data storage cluster is assigned to the reliability group, along with the data zone of the first node and the parity zone of the second node, and wherein the parity chunks determination on the second node after receiving data chunks from the first node does not require use of data from the second data zone of the third node assigned to the reliability group. 14. The medium as set forth in claim 10 further comprising: receiving a request for storing a second data object including one or more data chunks; writing the data chunks of the second data object to a second data zone of a third data node
Using snapshots, i.e. a logical point-in-time copy of the data · CPC title
in relation to data integrity, e.g. data losses, bit errors · CPC title
Improving I/O performance · CPC title
Management of blocks · CPC title
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.