Apparatus and method for predicting processing performance
US-9336055-B2 · May 10, 2016 · US
US8963932B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-8963932-B1 |
| Application number | US-64144706-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 18, 2006 |
| Priority date | Aug 1, 2006 |
| Publication date | Feb 24, 2015 |
| Grant date | Feb 24, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of calculating performance parameters for a type of data being executed by a unified processing subunit. In one embodiment, a task (e.g., a draw call) is executed by a processing pipeline (e.g., a GPU). An ALU within a unified processing subunit (e.g., a unified shader processing unit) is queried to determine a type of data (e.g., vertex processing, pixel shading) being processed by the ALU. Performance parameters (e.g., bottleneck and utilization) for the type of data being processed by the ALU is calculated and displayed (e.g., stacked graph). Accordingly, software developers can visualize component workloads of a unified processing subunit architecture. As a result, utilization of the unified processing subunit processing a particular data may be maximized while bottleneck is reduced. Therefore, the efficiency of the unified processing subunit and the processing pipeline is improved.
Opening claim text (preview).
What is claimed is: 1. A method of calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a measure of time that said unified processor subunit is processing said data type plus a measure of time that said unified processor subunit pauses an upstream component because said unified processor subunit is busy minus the time which said unified processor subunit is paused because a downstream component is busy and does not accept further data, all over the time required by said processing pipeline to process said executable task, and wherein said calculating is based on a counter operable to increment based on an individual processing of said data type. 2. The method as described in claim 1 , wherein said unified processor subunit is operable to process at least two types of data. 3. The method as described in claim 1 , wherein said calculating said performance parameters for said unified processor subunit processing said data type comprises: calculating utilization, wherein said utilization is a measure of a percentage that said unified processor subunit is processing said data type over the time said processing pipeline required to process said executable task. 4. The method as described in claim 1 , wherein said plurality of processing subunits comprises a plurality of arithmetic logic units (ALUs), wherein said plurality of ALUs partially form an upstream component and a downstream component of said unified processor subunit. 5. The method as described in claim 1 , wherein said processor pipeline is a pipeline graphical processing unit (GPU), and wherein said executable task is a draw call processed on said GPU, and wherein said unified processing subunit is capable of processing vertex, geometry, rasterizer and pixel data types. 6. The method as described in claim 1 , wherein said method further comprises: outputting said calculated performance parameters for said unified processor subunit processing said data type. 7. The method as described in claim 1 further comprising: displaying calculated performance parameters for a plurality of data types processed by said unified processor subunit in a stacked graph format. 8. A non-transitory computer-useable storage medium having computer-readable program code stored thereon for causing a computer system to execute a method for calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a measure of time that said unified processor subunit is processing said data type plus a measure of time that said unified processor subunit pauses an upstream component because said unified processor subunit is busy minus the time which said unified processor subunit is paused because a downstream component is busy and does not accept further data, all over the time required by said processing pipeline to process said executable task, and wherein said calculating is based on a counter operable to increment based on an individual processing of said data type. 9. The non-transitory computer-useable storage medium as described in claim 8 , wherein said unified processor subunit is operable to process at least two types of data. 10. The non-transitory computer-useable storage medium as described in claim 8 , wherein said calculating said performance parameters for said unified processor subunit processing said data type comprises: calculating utilization, wherein said utilization is a measure of a percentage that said unified processor subunit is processing said data type over the time said processing pipeline required to process said executable task. 11. The non-transitory computer-useable storage medium as described in claim 8 , wherein said plurality of processing subunits comprises a plurality of arithmetic logic units (ALUs), wherein said plurality of ALUs partially form an upstream component and a downstream component of said unified processor subunit. 12. The non-transitory computer-useable storage medium as described in claim 8 , wherein said processor pipeline is a pipeline graphical processing unit (GPU), and wherein said executable task is a draw call processed on said GPU, and wherein said unified processing subunit is capable of processing vertex, geometry, rasterizer and pixel data types. 13. The non-transitory computer-useable storage medium as described in claim 8 , wherein said method further comprises: outputting said calculated performance parameters for said unified processor subunit processing said data type. 14. The computer-useable storage medium as described in claim 8 , wherein said method further comprises: displaying calculated performance parameters for a plurality of data types processed by said unified processor subunit in a stacked graph format. 15. A computer system comprising a processor coupled to a bus, a transmitter/receiver coupled to said bus, and a memory coupled to said bus, wherein said memory comprises instructions that when executed on said processor implement a method for calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters ass
Visualisation of programs or trace data · CPC title
where the assessed time is active or idle time · CPC title
Performance evaluation by tracing or monitoring · CPC title
Monitoring involving counting · CPC title
Workload generation, e.g. scripts, playback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.