What technology area does this patent fall under?

Primary CPC classification G06F9/5066. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for dynamic granularity control of parallelized work in a portable computing device (PCD)

US9507641B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9507641-B1
Application number	US-201514709385-A
Country	US
Kind code	B1
Filing date	May 11, 2015
Priority date	May 11, 2015
Publication date	Nov 29, 2016
Grant date	Nov 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD) are provided. During operation a first parallelized portion of an application executing on the PCD is identified. The first parallelized portion comprising a plurality of threads for parallel execution on the PCD. Performance information is obtained about a plurality of processors of the PCD, each of the plurality of processors corresponding to one of the plurality of threads. A number M of workload partition granularities for the plurality of threads is determined, and a total execution cost for each of the M workload partition granularities is determined. An optimal granularity comprising a one of the M workload partition granularities with a lowest total execution cost is determined, and the first parallelized portion is partitioned into a plurality of workloads having the optimal granularity.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for a method for providing dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD), the method comprising: identifying a first parallelized portion of an application executing on the heterogeneous multi-processor PCD, the first parallelized portion comprising a plurality of threads for parallel execution on the PCD; obtaining performance information about a plurality of processors of the PCD, each of the plurality of processors corresponding to one of the plurality of threads; determining a number M of workload partition granularities for the plurality of threads, where M is a positive integer; determining a total execution cost for each of the M workload partition granularities, wherein the determination of the total execution cost comprises: determining an amount of processing work that will be performed by each of the plurality of processors, determining an amount of overhead cost that will be incurred by each of the plurality of processors, multiplying the amount of processing work performed by each of the plurality of processors by the amount of overhead incurred by the corresponding one of the plurality of processors, and summing the multiplied values for the plurality of processors; determining a desired granularity comprising a one of the M workload partition granularities with a lowest total execution cost; and partitioning the first parallelized portion of the application into a plurality of workloads having the desired granularity. 2. The method of claim 1 , wherein obtaining performance information about the plurality of processors of the PCD comprises: obtaining a present performance level of the plurality of processors of the PCD. 3. The method of claim 2 , wherein obtaining the present performance level of the plurality of processors of the PCD comprises querying a system information file. 4. The method of claim 2 , wherein obtaining the present performance level of the plurality of processors of the PCD further comprises determining one or more of: a present clock frequency of each of the plurality of processors, a demand from a competing application for one or more of the plurality of processors, a thermal throttling applied to one or more of the plurality of processors, a power throttling applied to one or more of the plurality of processors, or a sleep mode applied to one or more of the plurality of processors. 5. The method of claim 1 , wherein determining the amount of overhead cost that will be incurred by each of the plurality of processors further comprises determining for each of the plurality of processors one or more of: a latency involved in dispatching work to the processor, a delay from synchronization when obtaining work from a queue, a delay from signaling that processing has completed, and an idle wait. 6. The method of claim 1 , wherein the determination of the total execution cost for each of the M workload partition granularities is based in part on information about the first parallelized portion of the application derived when the application was compiled. 7. The method of claim 1 , further comprising: distributing the plurality of workloads having the desired granularity to the plurality of processors. 8. A system for providing dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD): a central processing unit (CPU) containing a plurality of heterogeneous processors; and a memory in communication with the CPU, the memory storing: at least one application being executed by the CPU, logic configured to: identify a first parallelized portion of the application, the first parallelized portion comprising a plurality of threads for parallel processing by the CPU, obtain performance information about a first set of the plurality of processors of the PCD, each of first set of the plurality of processors corresponding to one of the plurality of threads, determine a number M of workload partition granularities for the plurality of threads where M is a positive integer, determine a total execution cost for each of the M workload partition granularities by: determining an amount of processing work that will be performed by each of the first set of the plurality of processors, determining an amount of overhead cost that will be incurred by each of the first set of the plurality of processors, multiplying the amount of processing work performed by each of the first set of the plurality of processors by the amount of overhead incurred by the corresponding one of the first set of the plurality of processors, and summing the multiplied values for the first set of the plurality of processors; determine a desired granularity comprising a one of the M workload partition granularities with a lowest total execution cost, and partition the first parallelized portion of the application into a plurality of workloads having the desired granularity. 9. The system of claim 8 , wherein the obtaining performance information about the first set of the plurality of processors of the PCD comprises: obtaining a present performance level of first set of the plurality of processors of the PCD. 10. The system of claim 9 , wherein the obtaining a present performance level of first set of the plurality of processors of the PCD comprises querying a system information file. 11. The system of claim 9 , wherein obtaining the present performance level of the plurality of processors of the PCD further comprises determining one or more of: a present clock frequency of each of the first set of the plurality of processors, a demand from a competing application for one or more of the first set of the plurality of processors, a thermal throttling applied to one or more of the first set of the plurality of processors, a power throttling applied to one or more of the first set of the plurality of processors, or a sleep mode applied to one or more of the first set of the plurality of processors. 12. The system of claim 8 , wherein the determination of the amount of overhead cost that will be incurred by each of the plurality of processors further comprises determining for each of the first set of the plurality of processors one or more of: a latencies involved in dispatching work to the processor, a delay from synchronization when obtaining work from a queue, a delay from signaling that processing has completed, and an idle wait. 13. The system of claim 8 , where the determination of the total execution cost for each of the M workload partition granularities is based in part on information about the first parallelized portion of the application derived when the application was compiled. 14. The system of claim 8 , wherein the logic is further configured to: distribute the plurality of workloads having the optimal desired granularity to the first set of the plurality of processors. 15. A computer program product comprising a non-transitory computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement a method for dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD), the method comprising: identifying a first parallelized portion of an application executing on the heterogeneous multi-processor PCD, the first parallelized portion comprising a plurality of threads for parallel execution on the PCD; obtaining performance information about a plurality of processors of the PCD, each of the

Assignees

Qualcomm Inc

Inventors

Classifications

G06F9/5066Primary
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
G06F9/505
considering the load · CPC title
G06F9/5083
Techniques for rebalancing the load in a distributed system · CPC title
G06F9/5044Primary
considering hardware capabilities · CPC title
Y02D10/00
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

Patent family

Related publications grouped by family.

View patent family 55953387

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9507641B1 cover?: Systems and methods for dynamic granularity control of parallelized work in a heterogeneous multi-processor portable computing device (PCD) are provided. During operation a first parallelized portion of an application executing on the PCD is identified. The first parallelized portion comprising a plurality of threads for parallel execution on the PCD. Performance information is obtained about a…
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G06F9/5066. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).