Method and apparatus for minimally intrusive instruction pointer-aware processing resource activity profiling

US11210094B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11210094-B2
Application numberUS-201916585427-A
CountryUS
Kind codeB2
Filing dateSep 27, 2019
Priority dateSep 27, 2019
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for minimally intrusive instruction pointer-aware processing resource activity profiling are disclosed. In one embodiment, a graphics processor includes a grouping of processing resources and control logic that is associated with the grouping of processing resources. The control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, and shader activity.

First claim

Opening claim text (preview).

What is claimed is: 1. A graphics processor, comprising: a grouping of processing resources; control logic that is associated with the grouping of processing resources, the control logic is configured to sample a state of at least one processing resource of the grouping of processing resources and to determine activity data from the state with the activity data including at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, or shader activity, wherein the control logic is configured to discard a state for a chosen cycle that is sampled if the processing resource is idle or executing an instruction. 2. The graphics processor of claim 1 , further comprising: a cache unit that is associated with the grouping of processing resources, the cache unit to receive an instruction pointer address and the activity data including a stall reason for each state of processing resources that are associated with the cache unit. 3. The graphics processor of claim 2 , wherein each sampling of a state is scheduled for a chosen clock cycle and is minimally intrusive. 4. The graphics processor of claim 1 , wherein the control logic is configured to store a stall state when threads are allocated on a processing resource with no instruction being executed for a chosen cycle that is sampled. 5. The graphics processor of claim 1 , wherein the control logic is configured to interleave samplings of states of processing resources among the grouping of processing resources and other groupings of processing resources, to resolve the states into one of a number of supported stall reasons, and to prioritize the supported stall reasons based on a priority level of the stall reasons. 6. The graphics processor of claim 1 , wherein the supported stalls and reason counts for stalling activity comprise a synch stall field for a stall or delay between threads to reach a common point, an instruction fetch field for an instruction fetch from memory that is stalled, a scoreboard field for a stall based on a data dependency, a send stall field for a send bus bandwidth limit for an processing resource, a pipe stall field for a stall within a pipeline, and an internal stall field for a stall caused from a memory bank collision. 7. A cache structure, comprising: logic to perform operations of the cache structure; and memory coupled to the logic, the memory to store instruction pointer addresses and associated data fields to indicate activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls from sampling of processing resources, wherein the logic is configured to receive an instruction pointer address and activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls for a state of processing resources that are associated with the cache structure. 8. The cache structure of claim 7 , wherein the logic is configured to perform an instruction pointer address lookup within the cache structure. 9. The cache structure of claim 8 , wherein the logic is configured to build an entry for a new cache line when the instruction pointer lookup misses, to store the instruction pointer address and the activity data in the new cache line, to initialize the identified activity including a stall reason to a count while all other reason counts are initialized to a different count. 10. The cache structure of claim 9 , wherein the logic is configured to determine if all available lines of the cache structure are occupied and to perform a capacity-eviction to evict an existing line if all available lines of the cache structure are occupied. 11. The cache structure of claim 8 , wherein the logic is configured to determine a hit for instruction pointer address lookup, to perform a read operation of a cache line for the instruction pointer address, to perform a modify operation to increment a count of the identified activity, and to perform a write operation for the cache line. 12. The cache structure of claim 8 , wherein the logic is configured for a maximum value eviction when a given cache line has an activity count that reaches a maximum representable value and performs the maximum value eviction by evicting the instruction pointer address and its corresponding data to a circular buffer in a main memory. 13. A method for minimally intrusive profiling of a graphics processing unit (GPU), comprising: receiving, with a cache unit, an instruction pointer address and activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls for a stall state of processing resources that are associated with the cache unit; and performing an instruction pointer address lookup within the cache unit for the received instruction pointer address and associated activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls. 14. The method of claim 13 , further comprising: building an entry for a new cache line when the instruction pointer lookup misses. 15. The method of claim 14 , further comprising: storing the instruction pointer address and the activity data in the new cache line; and initializing the identified activity including a stall reason to 1 while all other reason counts are initialized to 0. 16. The method of claim 15 , further comprising: determining if all available lines of the cache structure are occupied and to perform a capacity-eviction to evict an existing line if all available lines of the cache structure are occupied. 17. The method of claim 14 , further comprising: determining a hit for instruction pointer address lookup. 18. The method of claim 17 , further comprising: performing a read operation of a cache line for the instruction pointer address; performing a modify operation to increment a count of the identified activity for the instruction pointer address; and performing a write operation for the cache line. 19. The method of claim 18 , further comprising: performing a maximum value eviction when a given cache line has an activity count that reaches a maximum representable value, wherein performing the maximum value eviction comprises evicting the instruction pointer address and its corresponding data to a circular buffer in a main memory. 20. The method of claim 13 , wherein the activity data includes at least one of stalls and reason counts for stalling activity, instruction types, pipeline utilization, thread utilization, or shader activity. 21. A non-transitory computer-readable storage medium having stored thereon data representing sequences of instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving, with a cache unit of a graphics processing unit (GPU), an instruction pointer address and activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls for a stall state of processing resources that are associated with the cache unit; and performing an instruction pointer address lookup within the cache unit for the received instruction pointer address and associated activity data including different types of stalls and reason counts to count occurrences for each of the different types of stalls. 22. The medium of claim 21 , further

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F9/3009Primary

    Thread control instructions · CPC title

  • Thread allocation · CPC title

  • Cache consistency protocols · CPC title

  • Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210094B2 cover?
Systems and methods for minimally intrusive instruction pointer-aware processing resource activity profiling are disclosed. In one embodiment, a graphics processor includes a grouping of processing resources and control logic that is associated with the grouping of processing resources. The control logic is configured to sample a state of at least one processing resource of the grouping of proc…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).