Elastic resource scaling
US-9225724-B2 · Dec 29, 2015 · US
US9830327B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9830327-B2 |
| Application number | US-201415033853-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 26, 2014 |
| Priority date | Nov 29, 2013 |
| Publication date | Nov 28, 2017 |
| Grant date | Nov 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, a device, a node and a system for managing file in distributed data warehouse are provided. The method includes: acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met, thereby resolving the technical issue that an accidentally deleted file can not be recovered by setting a trash in the management node in some cases and ensuring the data security of the Hadoop system.
Opening claim text (preview).
The invention claimed is: 1. A method for managing file in distributed data warehouse, comprising: acquiring, by a data node, a deleting instruction carrying a data block identifier, wherein the deleting instruction is sent by a management node; suspending, by the data node, the deleting instruction; and deleting, by the data node, a data block corresponding to the data block identifier after a condition is met; wherein the process of suspending, by the data node, the deleting instruction comprises storing the data block identifier into a delay queue; wherein the process of deleting, by the data node, the data block corresponding to the data block identifier after the condition is met comprises: deleting, by the data node, data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue wherein before deleting the data blocks corresponding to all the data block identifiers in the delay queue in response to an emptying instruction sent by a client for emptying the data blocks corresponding to all the data block identifiers in the delay queue, the method further comprises: determining data blocks in the data node corresponding to all the data block identifiers in the delay queue; calculating a parameter of occupation of the determined data blocks in the data node; and sending the parameter of occupation to the management node, wherein the client determines after checking the parameter of occupation whether to send to the data node the emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue; wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, the process of calculating a parameter of occupation of the determined data blocks in the data node comprises: calculating a storage space occupied by the determined data blocks in the data node as the delay deleting storage space; and calculating a percentage of an entire storage space of the data node occupied by the delay deleting storage space as the delay deleting percentage. 2. The method according to claim 1 , wherein the process of deleting, by the data node, a data block corresponding to the data block identifier after a condition is met comprises: deleting, by the data node, the data block corresponding to the data block identifier in a case that a period since the data block identifier is stored into the delay queue reaches a predetermined time threshold. 3. The method according to claim 1 , wherein after storing the data block identifier into the delay queue, the method further comprises: receiving a recovering instruction sent by the management node for recovering the data block corresponding to the data block identifier stored in the delay queue; and sending to the management node a report carrying data block identifiers of all the data blocks stored in the data node, so that the management node creates a mapping from the data block identifier to the data node based on the data block identifiers in the received report. 4. The method according to claim 2 , wherein the method further comprises: receiving a time configuration instruction carrying a specified time length sent by the client, wherein the time configuration instruction is utilized in dynamic configuration of the predetermined time threshold; and updating the predetermined time threshold to the specified time length based on the time configuration instruction. 5. The method according to claim 3 , wherein the method further comprises: receiving a time configuration instruction carrying a specified time length sent by the client, wherein the time configuration instruction is utilized in dynamic configuration of the predetermined time threshold; and updating the predetermined time threshold to the specified time length based on the time configuration instruction. 6. A method for managing file in distributed data warehouse, comprising: receiving from a client an instruction for deleting a specified file; determining, by a management node, a data block which belongs to the specified file and is stored in a data node; sending to the data node, by the management node, a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; receiving a file recovering instruction sent by the client for recovering the specified file; recovering an eligible first correspondence relation, wherein the eligible first correspondence relation is a first correspondence relation which is backed-up before the deleting instruction is sent and a time point for backup is closest to a time point of sending the deleting instruction, and the first correspondence relation comprises a relation between the specified file and a data block identifier of a data block in the specified file; and recovering a second correspondence relation, wherein the second correspondence relation is a mapping from the data block identifier of the data block to the data node storing the data block; wherein the method further comprises: sending to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue; and receiving a parameter of occupation sent by the data node, wherein the parameter of occupation comprises a delay deleting storage space and a delay deleting percentage, wherein the delay deleting storage space is a storage space occupied by the data blocks corresponding to all the data block identifiers in the delay queue of the data node, and the delay deleting percentage is a percentage of the entire storage space of the data node occupied by the delay deleting storage space, such that the client determines after checking the parameter of occupation whether to send to the data node an emptying instruction for emptying the data blocks corresponding to all the data block identifiers in the delay queue. 7. The method according to claim 6 , wherein the data block identifier is stored into a delay queue by the data node, wherein the process of recovering the second correspondence relation comprises: sending to the data node a recovering instruction for recovering the data block corresponding to the data block identifier stored in the delay queue, wherein the data node sends to the management node a report carrying data block identifiers of all the data blocks stored in the data node after receiving the recovering instruction; receiving the report sent by the data node; and creating a mapping from the data block identifier to the data node based on the data block identifiers in the received report. 8. A method for managing file in distributed data warehouse, wherein the method comprises: sending to a management node an instruction for deleting a specified file, wherein the instruction for deleting the specified file is utilized by the management node to determine a data block which belongs to the specified file and is stored in a data node, and the management node sends to the data node a deleting instruction carrying a data block identifier of the data block, wherein the deleting instruction is suspended by the data node, until the data node deletes the data block corresponding to the data block identifier after a condition is met; wherein the data block identifier is stored into a delay queue by the data node, the method further comprises: checking a parameter of occupation sent to the management node by each data node, wherei
Error detection or correction of the data by redundancy in operations (error detection or correction of the data by redundancy in hardware G06F11/16) · CPC title
Physics · mapped topic
Physics · mapped topic
using file system or storage system metadata · CPC title
Details of free space management performed by the file system (saving storage space on storage systems G06F3/0608; management of blocks in storage devices G06F3/064) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.