Data processing apparatus and method for processing a plurality of threads

US9547530B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9547530-B2
Application numberUS-201314069727-A
CountryUS
Kind codeB2
Filing dateNov 1, 2013
Priority dateNov 1, 2013
Publication dateJan 17, 2017
Grant dateJan 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state data for a corresponding thread. At least two of the storage regions have different fixed numbers of entries. The processing circuitry processes as the same thread group threads having thread state data stored in the same storage region and processes threads having thread state data stored in different storage regions as different thread groups.

First claim

Opening claim text (preview).

I claim: 1. A data processing apparatus comprising: a processor configured to process a plurality of threads each having associated thread state data; wherein the processor is configured to process the plurality of threads in a plurality of thread groups, each thread group comprising a number of threads processed in parallel with a common program being executed for each thread of the thread group; a plurality of thread state storage regions each comprising a fixed number of thread state entries, where each thread state entry is configured to store the thread state data including a thread program counter indicative of a next program instruction to be executed for a corresponding thread of the plurality of threads; wherein at least two of the thread state storage regions have different fixed numbers of the thread state entries; wherein the processor is configured to automatically select and process within the same thread group threads having thread state data stored in thread state entries of the same thread state storage region, and to automatically select and process within different thread groups threads having thread state data stored in thread state entries of different thread state storage regions; and a migration control circuitry configured to maintain a divergence counter indicative of a number of times a difference in behavior is detected between at least one selected thread and at least one remaining thread; wherein the migration control circuitry is configured to detect said difference in behavior in response to a divergence indicating instruction included in the common program, the divergence indicating instruction comprising a predetermined instruction having an associated divergence indicating flag, and to migrate said at least one selected thread in response to said divergence counter being greater than a predetermined migration threshold. 2. The data processing apparatus according to claim 1 , comprising allocation circuitry configured to control allocation of the thread state data for a set of threads to be processed with a specified common program to thread state entries within one or more of the thread state storage regions. 3. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to allocate the thread state data for the set of threads to thread state entries within the fewest possible thread state storage regions. 4. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to allocate the thread state data for the set of threads to thread state entries to minimize a total number of unused thread state entries, said unused thread state entries comprising thread state entries not currently storing thread state data for a thread being processed by the processor which are within the same thread state storage region as another thread state entry currently storing thread state data for a thread being processed by the processor. 5. The data processing apparatus according to claim 2 , wherein the allocation circuitry is configured to control the allocation of the thread state data based on a group size value specified for the specified common program, the group size value indicative of the number of threads to be processed with said specified common program. 6. The data processing apparatus according to claim 1 , wherein the thread state data comprises at least one data value processed or generated by the corresponding thread. 7. The data processing apparatus according to claim 1 , wherein each thread group is associated with a thread group program counter indicative of a next program instruction to be executed for the corresponding thread group. 8. The data processing apparatus according to claim 7 , wherein the thread state data comprises a thread program counter indicative of a next program instruction to be executed for the corresponding thread, and the thread group program counter for a thread group is derived from the thread program counter for each of the threads of the thread group. 9. The data processing apparatus according to claim 1 , comprising migration control circuitry configured to migrate the thread state data for at least one selected thread from a first thread state storage region to a second thread state storage region in response to detecting a migration event. 10. The data processing apparatus according to claim 9 , wherein the first thread state storage region has a greater number of the thread state entries than the second thread state storage region. 11. The data processing apparatus according to claim 9 , wherein the migration event comprises the migration control circuitry detecting a difference in behaviour between the at least one selected thread and at least one remaining thread having thread state data stored in the first thread state storage region. 12. The data processing apparatus according to claim 9 , wherein the migration control circuitry is configured to migrate the thread state data for the at least one selected thread to the second thread state storage region in response to the migration event if the number of threads of the at least one selected thread is less than or equal to the number of thread state entries of the second thread state storage region. 13. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour if the at least one remaining thread does not require further processing operations to be performed and the at least one selected thread requires at least one further processing operation to be performed. 14. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour if the at least one selected thread and the at least one remaining thread require different processing operations to be performed. 15. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour in response to detecting divergent memory accesses triggered by the at least one selected thread and the at least one remaining thread. 16. The data processing apparatus according to claim 15 , wherein said divergent memory accesses comprise at least one of: accesses to different cache lines of a cache memory; accesses to different pages of memory; and accesses which cannot be coalesced into a single memory access. 17. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to maintain a divergence counter indicative of a number of times said difference in behaviour is detected between said at least one selected thread and said at least one remaining thread; and the migration control circuitry is configured to migrate said at least one selected thread in response to said divergence counter being greater than a predetermined migration threshold. 18. The data processing apparatus according to claim 11 , wherein the migration control circuitry is configured to detect said difference in behaviour in response to a divergence indicating instruction included in the common program. 19. The data processing apparatus according to claim 18 , wherein the divergence indicating instruction comprises a predetermined instruction having an associated divergence indicating flag. 20. The data processing apparatus according to claim 19 , wherein the predetermined instruction comprises at least one of a load/store instr

Assignees

Inventors

Classifications

  • G06F9/50Primary

    Allocation of resources, e.g. of the central processing unit [CPU] · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • Divergence aspects · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9547530B2 cover?
A data processing apparatus has processing circuitry for processing threads each having thread state data. The threads may be processed in thread groups, with each thread group comprising a number of threads processed in parallel with a common program executed for each thread. Several thread state storage regions are provided with fixed number of thread state entries for storing thread state da…
Who is the assignee on this patent?
Advanced Risc Mach Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).