Data movement from a database to a distributed file system
US-2015134699-A1 · May 14, 2015 · US
US9600342B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9600342-B2 |
| Application number | US-201514796643-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 10, 2015 |
| Priority date | Jul 10, 2014 |
| Publication date | Mar 21, 2017 |
| Grant date | Mar 21, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various techniques are described herein for creating data partition process schedules and executing such partition schedules using multiple parallel process instances. Data processing tasks initiated by or for applications may be executed by creating and executing partition schedules, in which a number of different process instances are created and each assigned a subset of data to process. Partition schedules may be used to determine a number of process instances to be created, and each process instance may be assigned a unique set of run-time data values corresponding to a unique set of parameters within the data set to be processed by the application. The process instances may operate independently and in parallel to retrieve and process separate partitions of the data required for the overall data processing task initiated by/for the application.
Opening claim text (preview).
What is claimed is: 1. A process scheduling and management system comprising: a processing unit comprising one or more processors; and memory coupled with and readable by the processing unit and storing therein a set of instructions which, when executed by the processing unit, causes the process scheduling and management system to: identify a plurality of parameters within a data set comprising one or more data tables stored in a backend data store, wherein identifying the plurality of parameters within the data set comprises: receiving a selection of an application class; executing the selected application class; and identifying the plurality of parameters based on the execution of the selected application class; for each parameter of the identified parameters, determine a number of unique values for the parameter within the data set, wherein said determining is performed within the execution of the selected application class; determine a number of process instances to create of a data processing executable component, said determining comprising calculating a number of unique combinations of parameter values by multiplying together the determined number of unique values for each of the plurality of identified parameters; create the determined number of process instances of the data processing executable component; and provide to each of the process instances data corresponding to a unique combination of values of the identified parameters within the data set, wherein the unique combinations of values for the process instances are determined independently of the backend data store storing the data tables, and wherein each of the process instances is configured to retrieve a unique set of target data from the data tables, based on the unique combination of values provided to the process instance. 2. The process scheduling and management system of claim 1 , wherein each of the process instances is executed within an application layer of the system, and wherein each of the process instances is configured to retrieve its unique set of target data from the backend data store, the backend data store comprising at least one of a database server or a data cache in the application layer. 3. The process scheduling and management system of claim 2 , wherein the one or more data tables are not stored as partitioned tables. 4. The process scheduling and management system of claim 1 , wherein the plurality of parameters identified within the data set are different from a set of additional parameters used by a partitioning scheme within the backend data store storing the data tables. 5. The process scheduling and management system of claim 1 , wherein the determinations of the number of unique values for each parameter, and the determination of the number of process instances to create, are performed after and in response to an initiation of a data processing task by an application on the data set. 6. The process scheduling and management system of claim 1 , wherein the unique sets of target data for multiple different process instances are stored in the same tables within the one or more data tables. 7. The process scheduling and management system of claim 1 , the memory storing further instructions which, when executed by the processing unit, causes the process scheduling and management system to: establish a child-parent link between each of the created process instances and a parent partition scheduler process. 8. The process scheduling and management system of claim 7 , the memory storing further instructions which, when executed by the processing unit, causes the process scheduling and management system to: use the parent partition scheduler process to update a status of one or more of the process instances, in response to user input received via the parent partition scheduler process. 9. The process scheduling and management system of claim 7 , the memory storing further instructions which, when executed by the processing unit, causes the process scheduling and management system to: receive, at the parent partition scheduler process, execution status messages from each of the process instances. 10. The process scheduling and management system of claim 1 , wherein identifying the plurality of parameters within the data set comprises: receiving one or more user selections corresponding to the plurality of parameters. 11. The process scheduling and management system of claim 1 , wherein creating the process instances of the data processing executable component comprises: determining that an application has initiated a data processing task on the data set; determining the plurality of identified parameters and the number of unique values for each of the identified parameters; and creating a record in an application run control table corresponding to each of the unique combination of values of the identified parameters. 12. A method of process scheduling and management, comprising: identifying, by a partition scheduler computing device, a plurality of parameters within a data set comprising one or more data tables, wherein identifying the plurality of parameters within the data set comprises: receiving a selection of an application class; executing the selected application class; and identifying the plurality of parameters based on the execution of the selected application class; determining, by the partition scheduler computing device, for each parameter of the identified parameters, a number of unique values for the parameter within the data set, wherein said determining is performed within the execution of the selected application class; determining, by the partition scheduler computing device, a number of process instances to create of a data processing executable component, said determining comprising calculating a number of unique combinations of parameter values by multiplying together the determined number of unique values for each of the plurality of identified parameters; creating, by the partition scheduler computing device, the determined number of process instances of the data processing executable component; and providing to each of the process instances, by the partition scheduler computing device, data corresponding to a unique combination of values of the identified parameters within the data set, wherein the unique combinations of values for the process instances are determined independently of a backend data store storing the data tables, and wherein each of the process instances is configured to retrieve a unique set of target data from the one or more data tables, based on the unique combination of values provided to the process instance. 13. The method of claim 12 , further comprising: executing each of the process instances within an application layer of a computing environment, wherein each of the process instances is configured to retrieve its unique set of target data from the backend data store, the backend data store comprising at least one of a database server or a data cache in the application layer. 14. The method of claim 12 , further comprising: establishing a child-parent link between each of the created process instances and a parent partition scheduler process; using the parent partition scheduler process to update a status of one or more of the process instances, in response to user input received via the parent partition scheduler process; and receiving, by the partition scheduler computing device, execution status messages from each of the process instances. 15. The method of claim 12 , wherein creating the process instances of the data processing execu
Physics · mapped topic
Partitioning or combining of resources · CPC title
Tablespace storage structures; Management thereof · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.