Method and apparatus for visualizing component workloads in a unified shader GPU architecture

US8963932B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-8963932-B1
Application numberUS-64144706-A
CountryUS
Kind codeB1
Filing dateDec 18, 2006
Priority dateAug 1, 2006
Publication dateFeb 24, 2015
Grant dateFeb 24, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of calculating performance parameters for a type of data being executed by a unified processing subunit. In one embodiment, a task (e.g., a draw call) is executed by a processing pipeline (e.g., a GPU). An ALU within a unified processing subunit (e.g., a unified shader processing unit) is queried to determine a type of data (e.g., vertex processing, pixel shading) being processed by the ALU. Performance parameters (e.g., bottleneck and utilization) for the type of data being processed by the ALU is calculated and displayed (e.g., stacked graph). Accordingly, software developers can visualize component workloads of a unified processing subunit architecture. As a result, utilization of the unified processing subunit processing a particular data may be maximized while bottleneck is reduced. Therefore, the efficiency of the unified processing subunit and the processing pipeline is improved.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a measure of time that said unified processor subunit is processing said data type plus a measure of time that said unified processor subunit pauses an upstream component because said unified processor subunit is busy minus the time which said unified processor subunit is paused because a downstream component is busy and does not accept further data, all over the time required by said processing pipeline to process said executable task, and wherein said calculating is based on a counter operable to increment based on an individual processing of said data type. 2. The method as described in claim 1 , wherein said unified processor subunit is operable to process at least two types of data. 3. The method as described in claim 1 , wherein said calculating said performance parameters for said unified processor subunit processing said data type comprises: calculating utilization, wherein said utilization is a measure of a percentage that said unified processor subunit is processing said data type over the time said processing pipeline required to process said executable task. 4. The method as described in claim 1 , wherein said plurality of processing subunits comprises a plurality of arithmetic logic units (ALUs), wherein said plurality of ALUs partially form an upstream component and a downstream component of said unified processor subunit. 5. The method as described in claim 1 , wherein said processor pipeline is a pipeline graphical processing unit (GPU), and wherein said executable task is a draw call processed on said GPU, and wherein said unified processing subunit is capable of processing vertex, geometry, rasterizer and pixel data types. 6. The method as described in claim 1 , wherein said method further comprises: outputting said calculated performance parameters for said unified processor subunit processing said data type. 7. The method as described in claim 1 further comprising: displaying calculated performance parameters for a plurality of data types processed by said unified processor subunit in a stacked graph format. 8. A non-transitory computer-useable storage medium having computer-readable program code stored thereon for causing a computer system to execute a method for calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a function of performance parameters associated with said unified processor subunit and parameters associated with said processing pipeline, and wherein said bottleneck is a measure of time that said unified processor subunit is processing said data type plus a measure of time that said unified processor subunit pauses an upstream component because said unified processor subunit is busy minus the time which said unified processor subunit is paused because a downstream component is busy and does not accept further data, all over the time required by said processing pipeline to process said executable task, and wherein said calculating is based on a counter operable to increment based on an individual processing of said data type. 9. The non-transitory computer-useable storage medium as described in claim 8 , wherein said unified processor subunit is operable to process at least two types of data. 10. The non-transitory computer-useable storage medium as described in claim 8 , wherein said calculating said performance parameters for said unified processor subunit processing said data type comprises: calculating utilization, wherein said utilization is a measure of a percentage that said unified processor subunit is processing said data type over the time said processing pipeline required to process said executable task. 11. The non-transitory computer-useable storage medium as described in claim 8 , wherein said plurality of processing subunits comprises a plurality of arithmetic logic units (ALUs), wherein said plurality of ALUs partially form an upstream component and a downstream component of said unified processor subunit. 12. The non-transitory computer-useable storage medium as described in claim 8 , wherein said processor pipeline is a pipeline graphical processing unit (GPU), and wherein said executable task is a draw call processed on said GPU, and wherein said unified processing subunit is capable of processing vertex, geometry, rasterizer and pixel data types. 13. The non-transitory computer-useable storage medium as described in claim 8 , wherein said method further comprises: outputting said calculated performance parameters for said unified processor subunit processing said data type. 14. The computer-useable storage medium as described in claim 8 , wherein said method further comprises: displaying calculated performance parameters for a plurality of data types processed by said unified processor subunit in a stacked graph format. 15. A computer system comprising a processor coupled to a bus, a transmitter/receiver coupled to said bus, and a memory coupled to said bus, wherein said memory comprises instructions that when executed on said processor implement a method for calculating performance parameters for a type of data being executed by a unified processor subunit, said method comprising: executing an executable task on a processor pipeline comprising a plurality of processing subunits and further comprising said unified processor subunit; querying said unified processor subunit and in response thereto determining a data type being processed by said unified processor subunit; and calculating performance parameters for said unified processor subunit processing said data type, wherein said calculating performance parameters comprises calculating a bottleneck that is a measurement of adverse performance of said plurality of processing subunits caused by said unified processor subunit, and wherein said bottleneck is a function of performance parameters ass

Assignees

Inventors

Classifications

  • Visualisation of programs or trace data · CPC title

  • where the assessed time is active or idle time · CPC title

  • Performance evaluation by tracing or monitoring · CPC title

  • Monitoring involving counting · CPC title

  • Workload generation, e.g. scripts, playback · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8963932B1 cover?
A method of calculating performance parameters for a type of data being executed by a unified processing subunit. In one embodiment, a task (e.g., a draw call) is executed by a processing pipeline (e.g., a GPU). An ALU within a unified processing subunit (e.g., a unified shader processing unit) is queried to determine a type of data (e.g., vertex processing, pixel shading) being processed by th…
Who is the assignee on this patent?
Kiel Jeffrey T, Cornish Derek M, Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F11/3423. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).