File immutability using a deduplication file system in a public cloud using new filesystem redirection
US-2024103978-A1 · Mar 28, 2024 · US
US11221986B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11221986-B2 |
| Application number | US-201716332775-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 31, 2017 |
| Priority date | May 31, 2017 |
| Publication date | Jan 11, 2022 |
| Grant date | Jan 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided is a data management method capable of deleting intermediate data at an appropriate timing. The data management method in a data analysis system that performs analysis by combining a plurality of input data based on an analysis execution request from a computer includes: a first step, in which a request analysis unit analyzes the analysis execution request from the computer to identify a task, identifies intermediate data generated after execution of each identified task, and generates constraint information that determines whether to delete the identified intermediate data; a second step, in which a task management unit determines whether to delete the intermediate data based on the constraint information for each identified task; and a third step, in which a task execution unit executes the identified task and deletes the intermediate data of the task based on a determination result of the second step.
Opening claim text (preview).
The invention claimed is: 1. A data management method in a data analysis system that performs analysis by combining a plurality of input data based on an analysis execution request from a computer, comprising: a first step, in which a request analysis unit: analyzes the analysis execution request from the computer to identify a plurality of tasks, identifies intermediate data generated by execution of each identified task, and identifies attribute information included in each of the identified intermediate data, and records the identified plurality of tasks, identified intermediate data and identified attribute information in a task management data structure, wherein each of the identified plurality of task are associate with a respective identified intermediate data and respective attribute information; identifies deletion target information data structure which defines blacklist information to be deleted; compares the attribute information for each identified intermediate data of the task management data structure with the blacklist information of the deletion target data structure, and determines whether a number of pieces of attribute information of each of the identified intermediate data is equal to or greater than a threshold number of pieces of the blacklist information; generates constraint information that determines whether to delete each of the identified intermediate data based on the comparison, the constraint information comprising countermeasure information for each of the identified intermediate data, wherein, for each of the identified intermediate data: the countermeasure information is set to delete in response to the number of pieces of attribute information of the respective identified intermediate data being equal to or greater than the threshold number of pieces of the blacklist information; and the countermeasure information is set to leave in response to the number of pieces of attribute information of the respective identified intermediate data being less than the threshold number of pieces of the blacklist information; a second step, in which a task management unit determines whether to delete or leave each of the identified intermediate data from the constraint information for each identified task; and a third step, in which a task execution unit executes each of the identified tasks and deletes or leaves intermediate data of each task based on a determination result of the second step. 2. The data management method according to claim 1 , wherein the task management unit generates a flow for the task execution unit to execute the plurality of tasks, the flow being generated in a following way, for each task of the plurality of tasks: a task is added to the flow, and then when it is determined, based on the constraint information, that intermediate data generated by execution of the added task includes a number of pieces of the attribute information equal to or greater than the threshold number of pieces of attribute information, a deletion task, for deleting the intermediate data, is added to the flow to be sequentially performed following the added task, and the task execution unit executes the plurality of tasks in accordance with the generated flow. 3. The data management method according to claim 1 , wherein the request analysis unit includes, in the constraint information, information indicating a task in which identified intermediate data are finally used, and the task management unit determines to delete the intermediate data when it is determined that the intermediate data generated after execution of an identified task are finally used based on the constraint information. 4. The data management method according to claim 3 , wherein the task management unit generates a flow for the task execution unit to execute the plurality of tasks, the flow being generated in a following way, for each of the plurality of tasks: a task is added to the flow, and then when it is determined, based on the constraint information, that intermediate data generated after execution of the added task are finally used, a deletion task, for deleting the intermediate data generated by execution of the added task, is added to the flow to be sequentially performed following the added task, and the task execution unit executes the tasks in accordance with the generated flow. 5. The data management method according to claim 1 , wherein the request analysis unit includes, in the constraint information, an analysis result of whether a generation time associated with identified intermediate data is shorter than a predetermined threshold value, and the task management unit determines to delete the intermediate data, when it is determined, based on the constraint information, that the generation time associated with the identified intermediate data is shorter than the threshold value. 6. The data management method according to claim 5 , wherein the task management unit generates a flow for the task execution unit to execute the plurality of tasks, the flow being generated in a following way, for each of the plurality of tasks: a task is added to the flow, and then when it is determined, based on the constraint information, that the generation time associated with intermediate data generated after execution of the added task is shorter than the threshold value, a deletion task, for deleting the intermediate data generated by execution of the added task, is added to the flow to be sequentially performed following the added task, and the task execution unit executes the tasks in accordance with the flow. 7. The data management method according to claim 1 , further comprising: a fourth step, in which the task execution unit deletes all intermediate data which are not deleted after execution of all the tasks of the analysis execution request; a fifth step, in which the task execution unit transmits a result of executing all the tasks of the analysis execution request to the computer as an execution result of the analysis execution request; and a sixth step, in which the computer outputs the received execution result of the analysis execution request. 8. A data analysis system that performs analysis by combining a plurality of input data based on an analysis execution request from a computer, comprising: a storage device configured to store a program; and a central processing unit (CPU) configured to execute the program stored in the storage device to: analyze the analysis execution request from the computer to identify a plurality of tasks; identify intermediate data generated after execution of each identified task; identify attribute information included in each of the identified intermediate data; record the identified plurality of tasks, identified intermediate data and identified attribute information in a task management data structure, wherein each of the identified plurality of task are associate with a respective identified intermediate data and respective attribute information; identify deletion target information which defines blacklist information to be deleted, compare the attribute information for each identified intermediate data of the task management data structure with the blacklist information of the deletion target data structure, and determine whether a number of pieces of attribute information of each of the identified intermediate data is equal to or greater than a threshold number of pieces of the blacklist information; generate constraint information that determines whether to delete each of the identified intermediate data based on the comparison, the constrain information comprising countermeasure information for each of the identified intermediate data; wherein, for each of the identified interm
Protecting data · CPC title
Delete operations (erasing in storage systems G06F3/0652) · CPC title
Column-oriented storage; Management thereof · CPC title
by program, e.g. task dispatcher, supervisor, operating system · CPC title
Protecting access to data via a platform, e.g. using keys or access control rules · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.