Systems and Methods for Efficient Data Preprocessing of Machine Learning Workloads
US-2024403138-A1 · Dec 5, 2024 · US
US9507641B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9507641-B1 |
| Application number | US-201514709385-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 11, 2015 |
| Priority date | May 11, 2015 |
| Publication date | Nov 29, 2016 |
| Grant date | Nov 29, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD) are provided. During operation a first parallelized portion of an application executing on the PCD is identified. The first parallelized portion comprising a plurality of threads for parallel execution on the PCD. Performance information is obtained about a plurality of processors of the PCD, each of the plurality of processors corresponding to one of the plurality of threads. A number M of workload partition granularities for the plurality of threads is determined, and a total execution cost for each of the M workload partition granularities is determined. An optimal granularity comprising a one of the M workload partition granularities with a lowest total execution cost is determined, and the first parallelized portion is partitioned into a plurality of workloads having the optimal granularity.
Opening claim text (preview).
What is claimed is: 1. A method for a method for providing dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD), the method comprising: identifying a first parallelized portion of an application executing on the heterogeneous multi-processor PCD, the first parallelized portion comprising a plurality of threads for parallel execution on the PCD; obtaining performance information about a plurality of processors of the PCD, each of the plurality of processors corresponding to one of the plurality of threads; determining a number M of workload partition granularities for the plurality of threads, where M is a positive integer; determining a total execution cost for each of the M workload partition granularities, wherein the determination of the total execution cost comprises: determining an amount of processing work that will be performed by each of the plurality of processors, determining an amount of overhead cost that will be incurred by each of the plurality of processors, multiplying the amount of processing work performed by each of the plurality of processors by the amount of overhead incurred by the corresponding one of the plurality of processors, and summing the multiplied values for the plurality of processors; determining a desired granularity comprising a one of the M workload partition granularities with a lowest total execution cost; and partitioning the first parallelized portion of the application into a plurality of workloads having the desired granularity. 2. The method of claim 1 , wherein obtaining performance information about the plurality of processors of the PCD comprises: obtaining a present performance level of the plurality of processors of the PCD. 3. The method of claim 2 , wherein obtaining the present performance level of the plurality of processors of the PCD comprises querying a system information file. 4. The method of claim 2 , wherein obtaining the present performance level of the plurality of processors of the PCD further comprises determining one or more of: a present clock frequency of each of the plurality of processors, a demand from a competing application for one or more of the plurality of processors, a thermal throttling applied to one or more of the plurality of processors, a power throttling applied to one or more of the plurality of processors, or a sleep mode applied to one or more of the plurality of processors. 5. The method of claim 1 , wherein determining the amount of overhead cost that will be incurred by each of the plurality of processors further comprises determining for each of the plurality of processors one or more of: a latency involved in dispatching work to the processor, a delay from synchronization when obtaining work from a queue, a delay from signaling that processing has completed, and an idle wait. 6. The method of claim 1 , wherein the determination of the total execution cost for each of the M workload partition granularities is based in part on information about the first parallelized portion of the application derived when the application was compiled. 7. The method of claim 1 , further comprising: distributing the plurality of workloads having the desired granularity to the plurality of processors. 8. A system for providing dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD): a central processing unit (CPU) containing a plurality of heterogeneous processors; and a memory in communication with the CPU, the memory storing: at least one application being executed by the CPU, logic configured to: identify a first parallelized portion of the application, the first parallelized portion comprising a plurality of threads for parallel processing by the CPU, obtain performance information about a first set of the plurality of processors of the PCD, each of first set of the plurality of processors corresponding to one of the plurality of threads, determine a number M of workload partition granularities for the plurality of threads where M is a positive integer, determine a total execution cost for each of the M workload partition granularities by: determining an amount of processing work that will be performed by each of the first set of the plurality of processors, determining an amount of overhead cost that will be incurred by each of the first set of the plurality of processors, multiplying the amount of processing work performed by each of the first set of the plurality of processors by the amount of overhead incurred by the corresponding one of the first set of the plurality of processors, and summing the multiplied values for the first set of the plurality of processors; determine a desired granularity comprising a one of the M workload partition granularities with a lowest total execution cost, and partition the first parallelized portion of the application into a plurality of workloads having the desired granularity. 9. The system of claim 8 , wherein the obtaining performance information about the first set of the plurality of processors of the PCD comprises: obtaining a present performance level of first set of the plurality of processors of the PCD. 10. The system of claim 9 , wherein the obtaining a present performance level of first set of the plurality of processors of the PCD comprises querying a system information file. 11. The system of claim 9 , wherein obtaining the present performance level of the plurality of processors of the PCD further comprises determining one or more of: a present clock frequency of each of the first set of the plurality of processors, a demand from a competing application for one or more of the first set of the plurality of processors, a thermal throttling applied to one or more of the first set of the plurality of processors, a power throttling applied to one or more of the first set of the plurality of processors, or a sleep mode applied to one or more of the first set of the plurality of processors. 12. The system of claim 8 , wherein the determination of the amount of overhead cost that will be incurred by each of the plurality of processors further comprises determining for each of the first set of the plurality of processors one or more of: a latencies involved in dispatching work to the processor, a delay from synchronization when obtaining work from a queue, a delay from signaling that processing has completed, and an idle wait. 13. The system of claim 8 , where the determination of the total execution cost for each of the M workload partition granularities is based in part on information about the first parallelized portion of the application derived when the application was compiled. 14. The system of claim 8 , wherein the logic is further configured to: distribute the plurality of workloads having the optimal desired granularity to the first set of the plurality of processors. 15. A computer program product comprising a non-transitory computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement a method for dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD), the method comprising: identifying a first parallelized portion of an application executing on the heterogeneous multi-processor PCD, the first parallelized portion comprising a plurality of threads for parallel execution on the PCD; obtaining performance information about a plurality of processors of the PCD, each of the
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
considering the load · CPC title
Techniques for rebalancing the load in a distributed system · CPC title
considering hardware capabilities · CPC title
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.