Systems and Methods for Efficient Data Preprocessing of Machine Learning Workloads
US-2024403138-A1 · Dec 5, 2024 · US
US9239741B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9239741-B2 |
| Application number | US-201213653308-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 16, 2012 |
| Priority date | Oct 16, 2012 |
| Publication date | Jan 19, 2016 |
| Grant date | Jan 19, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An embodiment method for massively parallel processing includes initiating a management instance on an initial machine, the management instance generating an initial partition corresponding to the initial machine, determining a total number of partitions desired for processing a database, the total number of partitions including the initial partition, determining a number of additional machines available to process the database, grouping the initial machine and the additional machines together in a pod, and launching the management instance on the additional machines in the pod to generate the total number of partitions desired for the database. Additional embodiment methods and an embodiment system operable to perform such methods are also disclosed.
Opening claim text (preview).
What is claimed is: 1. A method for massively parallel processing, comprising: initiating a management instance on an initial machine, the management instance generating an initial database partition corresponding to the initial machine; determining a total number of database partitions desired for processing a database, the total number of database partitions including the initial database partition; determining a number of additional machines available to process the database; grouping the initial machine and the additional machines together in a pod; generating a database partition-to-processing node mapping chart in accordance with the total number of database partitions and the number of additional machines; storing a pod configuration for the pod, wherein the pod configuration comprises a pod name; and after initiating the management instance on the initial machine, launching the management instance on the additional machines in the pod to generate the total number of database partitions desired for the database, without reinitiating the management instance on the initial machine and without regenerating the initial database partition. 2. The method of claim 1 , wherein the pod has a flexible membership over time. 3. The method of claim 1 , wherein the pod is expandable over time to add new machines as the new machines become available to process the database. 4. The method of claim 1 , further comprising expanding the pod to include a new machine after the management instance has been launched on the additional machines. 5. The method of claim 1 , wherein the steps of determining the number of additional machines and grouping the initial machine and the additional machines together are repeated. 6. The method of claim 1 , wherein the total number of the database partitions is greater than a sum total number of the initial machines and the additional machines. 7. The method of claim 1 , wherein one of the additional machines is running at least two of the management instances. 8. The method of claim 7 , further comprising relocating one of the management instances from the additional machine running two of the management instances to a new machine. 9. The method of claim 8 , wherein the new machine was added to the pod when the new machine became available to process the database. 10. The method of claim 1 , wherein one of the additional machines is running at least two of the management instances, each of the management instances corresponding directly with one of the database partitions. 11. A method for massively parallel processing, comprising: initiating a management instance on an initial machine, the management instance generating an initial database partition corresponding to the initial machine; determining a total number of database partitions desired for processing a database, the total number of database partitions including the initial database partition; determining a number of additional machines available to process the database, a sum total of the additional machines and the initial machine less than the total number of database partitions desired; grouping the initial machine and the additional machines together in a pod; generating a database partition-to-processing node mapping chart in accordance with the total number of database partitions and the number of additional machines; storing a pod configuration for the pod, wherein the pod configuration comprises a pod name; and after initiating the management instance on the initial machine, launching the management instance on the additional machines in the pod to generate the total number of database partitions desired for the database, without reinitiating the management instance on the initial machine and without regenerating the initial database partition. 12. The method of claim 11 , wherein the pod has a flexible membership over time. 13. The method of claim 11 , wherein the pod is expandable over time to add new machines as the new machines become available to process the database. 14. The method of claim 11 , further comprising expanding the pod to include a new machine after the management instance has been launched on the additional machines. 15. The method of claim 11 , wherein the steps of determining the number of additional machines and grouping the initial machine and the additional machines together are periodically repeated. 16. The method of claim 11 , wherein one of the additional machines is running at least two of the management instances. 17. The method of claim 16 , further comprising relocating one of the management instances from the additional machine running two of the management instances to a new machine. 18. The method of claim 17 , wherein the new machine was added to the pod when the new machine became available to process the database. 19. A massively parallel processing system, comprising: an initial machine in a pod, the initial machine running a management instance initiated on the initial machine generating an initial database partition for processing a database; an additional machine added to the pod after the management instance is initiated on the initial machine, without reinitiating the management instance on the initial machine and without regenerating the initial database partition, the additional machine running two of the management instances for processing the database, each of the management instances corresponding to an additional database partition; and a pod administrator machine configured to store a pod configuration, wherein the pod configuration comprises a pod name. 20. The massively parallel processing system of claim 19 , wherein the pod has a flexible membership permitting a new machine for processing the database to be added. 21. The massively parallel processing system of claim 20 , wherein the additional machine running two of the management instances is configured to relocate one of the management instances to the new machine.
Clust · CPC title
Partitioning or combining of resources · CPC title
Physics · mapped topic
of parallel queries · CPC title
Indexing; Data structures therefor; Storage structures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.