Virtual mode execution manager
US-12118376-B2 · Oct 15, 2024 · US
US9547530B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9547530-B2 |
| Application number | US-201314069727-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 1, 2013 |
| Priority date | Nov 1, 2013 |
| Publication date | Jan 17, 2017 |
| Grant date | Jan 17, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.
Opening claim text (preview).
I claim: 1. A data processing apparatus comprising: a processor configured to process a plurality of threads each having associated thread state data; wherein the processor is configured to process the plurality of threads in a plurality of thread groups, each thread group comprising a number of threads processed in parallel with a common program being executed for each thread of the thread group; a plurality of thread state storage regions each comprising a fixed number of thread state entries, where each thread state entry is configured to store the thread state data including a thread program counter indicative of a next program instruction to be executed for a corresponding thread of the plurality of threads; wherein at least two of the thread state storage regions have different fixed numbers of the thread state entries; wherein the processor is configured to automatically select and process within the same thread group threads having thread state data stored in thread state entries of the same thread state storage region, and to automatically select and process within different thread groups threads having thread state data stored in thread state entries of different thread state storage regions; and a migration control circuitry configured to maintain a divergence counter indicative of a number of times a difference in behavior is detected between at least one selected thread and at least one remaining thread; wherein the migration control circuitry is configured to detect said difference in behavior in response to a divergence indicating instruction included in the common program, the divergence indicating instruction comprising a predetermined instruction having an associated divergence indicating flag, and to migrate said at least one selected thread in response to said divergence counter being greater than a predetermined migration threshold. 2. The data processing apparatus according to claim 1 , comprising allocation circuitry configured to control allocation of the thread state data for a set of threads to be processed with a specified common program to thread state entries within one or more of the thread state storage regions. 3. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to allocate the thread state data for the set of threads to thread state entries within the fewest possible thread state storage regions. 4. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to allocate the thread state data for the set of threads to thread state entries to minimize a total number of unused thread state entries, said unused thread state entries comprising thread state entries not currently storing thread state data for a thread being processed by the processor which are within the same thread state storage region as another thread state entry currently storing thread state data for a thread being processed by the processor. 5. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to control the allocation of the thread state data based on a group size value specified for the specified common program, the group size value indicative of the number of threads to be processed with said specified common program. 6. The data processing apparatus according to claim 1 , wherein the thread state data comprises at least one data value processed or generated by the corresponding thread. 7. The data processing apparatus according to claim 1 , wherein each thread group is associated with a thread group program counter indicative of a next program instruction to be executed for the corresponding thread group. 8. The data processing apparatus according to claim 7 , wherein the thread state data comprises a thread program counter indicative of a next program instruction to be executed for the corresponding thread, and the thread group program counter for a thread group is derived from the thread program counter for each of the threads of the thread group. 9. The data processing apparatus according to claim 1 , comprising migration control circuitry configured to migrate the thread state data for at least one selected thread from a first thread state storage region to a second thread state storage region in response to detecting a migration event. 10. The data processing apparatus according to claim 9 , wherein the first thread state storage region has a greater number of the thread state entries than the second thread state storage region. 11. The data processing apparatus according to claim 9 , wherein the migration event comprises the migration control circuitry detecting a difference in behaviour between the at least one selected thread and at least one remaining thread having thread state data stored in the first thread state storage region. 12. The data processing apparatus according to claim 9 , wherein the migration control circuitry is configured to migrate the thread state data for the at least one selected thread to the second thread state storage region in response to the migration event if the number of threads of the at least one selected thread is less than or equal to the number of thread state entries of the second thread state storage region. 13. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour if the at least one remaining thread does not require further processing operations to be performed and the at least one selected thread requires at least one further processing operation to be performed. 14. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour if the at least one selected thread and the at least one remaining thread require different processing operations to be performed. 15. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour in response to detecting divergent memory accesses triggered by the at least one selected thread and the at least one remaining thread. 16. The data processing apparatus according to claim 15 , wherein said divergent memory accesses comprise at least one of: accesses to different cache lines of a cache memory; accesses to different pages of memory; and accesses which cannot be coalesced into a single memory access. 17. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to maintain a divergence counter indicative of a number of times said difference in behaviour is detected between said at least one selected thread and said at least one remaining thread; and the migration control circuitry is configured to migrate said at least one selected thread in response to said divergence counter being greater than a predetermined migration threshold. 18. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour in response to a divergence indicating instruction included in the common program. 19. The data processing apparatus according to claim 18 , wherein the divergence indicating instruction comprises a predetermined instruction having an associated divergence indicating flag. 20. The data processing apparatus according to claim 19 , wherein the predetermined instruction comprises at least one of a load/store instr
Allocation of resources, e.g. of the central processing unit [CPU] · CPC title
from multiple instruction streams, e.g. multistreaming · CPC title
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
Divergence aspects · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.