Apparatus and method for gang invariant operation optimizations using dynamic evaluation

US11093250B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11093250-B2
Application numberUS-201816147694-A
CountryUS
Kind codeB2
Filing dateSep 29, 2018
Priority dateSep 29, 2018
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more conditionally invariant operations will be invariant across threads of a parallel execution group and/or across the parallel execution lanes; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across threads of a parallel execution group and/or across the parallel execution lanes.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes, to produce a same variable value across the threads of the plurality of parallel execution groups; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and a first parallel execution lane to execute the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 2. The processor of claim 1 further comprising: a first set of registers in a first parallel execution lane to store the execution results; and data distribution circuitry to broadcast one or more of the execution results to additional sets of registers within the first parallel execution lane. 3. The processor of claim 1 wherein dynamically evaluating the one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 4. The processor of claim 1 wherein the scheduler is to cause one or more threads to wait for the execution of the shared thread to complete. 5. The processor of claim 1 wherein the threads are microthreads comprising a plurality of microoperations. 6. The processor of claim 5 wherein the front end circuitry further comprises a decoder to generate the microthreads responsive to decoding a plurality of macroinstructions. 7. The processor of claim 5 wherein the front end circuitry is to arrange the microthreads into the parallel execution groups based on instruction pointer values to induce microthread convergence. 8. The processor of claim 1 further comprising: mask storage to store an execution mask having at least one value associated with each parallel execution lane, wherein the front end circuitry is to enable or disable one or more of the parallel execution lanes based on the values associated with the lanes. 9. A method comprising: arranging a plurality of threads into parallel execution groups for execution on a plurality of parallel execution lanes, the threads comprising operations to be executed by execution circuitry within each of the parallel execution lanes; dynamically evaluating one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes to produce a same variable value across the threads of the plurality of parallel execution groups; scheduling a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and executing the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 10. The method of claim 9 further comprising: storing the execution results in a first set of registers in a first parallel execution lane; and broadcasting one or more of the execution results to additional sets of registers within the first parallel execution lane. 11. The method of claim 9 wherein dynamically evaluating one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 12. The method of claim 9 further comprising: causing one or more threads to wait for the execution of the shared thread to complete. 13. The method of claim 9 wherein the threads are microthreads comprising a plurality of microoperations. 14. The method of claim 13 further comprising: generating the microthreads responsive to decoding a plurality of macroinstructions. 15. The method of claim 13 further comprising: arranging the microthreads into the parallel execution groups based on instruction pointer values to induce microthread convergence. 16. The method of claim 9 further comprising: storing an execution mask having at least one value associated with each parallel execution lane; and enabling or disabling one or more of the parallel execution lanes based on the values associated with the lanes in the execution mask. 17. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: arranging a plurality of threads into parallel execution groups for execution on a plurality of parallel execution lanes, the threads comprising operations to be executed by execution circuitry within each of the parallel execution lanes; dynamically evaluating one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes to produce a same variable value across the threads of the plurality of parallel execution groups; scheduling a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and executing the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 18. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: storing the execution results in a first set of registers in a first parallel execution lane; and broadcasting one or more of the execution results to additional sets of registers within the first parallel execution lane. 19. The non-transitory machine-readable medium of claim 17 wherein dynamically evaluating one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 20. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: causing one or more threads to wait for the execution of the shared thread to complete. 21.

Assignees

Inventors

Classifications

  • controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title

  • G06F9/3851Primary

    from multiple instruction streams, e.g. multistreaming · CPC title

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • organised in groups of units sharing resources, e.g. clusters · CPC title

  • with global bypass, e.g. between pipelines, between clusters · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093250B2 cover?
An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arran…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).