What technology area does this patent fall under?

Primary CPC classification G06F9/3851. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Apparatus and method for gang invariant operation optimizations using dynamic evaluation

US11093250B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11093250-B2
Application number	US-201816147694-A
Country	US
Kind code	B2
Filing date	Sep 29, 2018
Priority date	Sep 29, 2018
Publication date	Aug 17, 2021
Grant date	Aug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more conditionally invariant operations will be invariant across threads of a parallel execution group and/or across the parallel execution lanes; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across threads of a parallel execution group and/or across the parallel execution lanes.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arrange the threads into parallel execution groups and schedule operations of the threads to be executed across the parallel execution lanes, wherein the front end circuitry is to dynamically evaluate one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes, to produce a same variable value across the threads of the plurality of parallel execution groups; a scheduler of the front end circuitry to responsively schedule a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and a first parallel execution lane to execute the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 2. The processor of claim 1 further comprising: a first set of registers in a first parallel execution lane to store the execution results; and data distribution circuitry to broadcast one or more of the execution results to additional sets of registers within the first parallel execution lane. 3. The processor of claim 1 wherein dynamically evaluating the one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 4. The processor of claim 1 wherein the scheduler is to cause one or more threads to wait for the execution of the shared thread to complete. 5. The processor of claim 1 wherein the threads are microthreads comprising a plurality of microoperations. 6. The processor of claim 5 wherein the front end circuitry further comprises a decoder to generate the microthreads responsive to decoding a plurality of macroinstructions. 7. The processor of claim 5 wherein the front end circuitry is to arrange the microthreads into the parallel execution groups based on instruction pointer values to induce microthread convergence. 8. The processor of claim 1 further comprising: mask storage to store an execution mask having at least one value associated with each parallel execution lane, wherein the front end circuitry is to enable or disable one or more of the parallel execution lanes based on the values associated with the lanes. 9. A method comprising: arranging a plurality of threads into parallel execution groups for execution on a plurality of parallel execution lanes, the threads comprising operations to be executed by execution circuitry within each of the parallel execution lanes; dynamically evaluating one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes to produce a same variable value across the threads of the plurality of parallel execution groups; scheduling a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and executing the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 10. The method of claim 9 further comprising: storing the execution results in a first set of registers in a first parallel execution lane; and broadcasting one or more of the execution results to additional sets of registers within the first parallel execution lane. 11. The method of claim 9 wherein dynamically evaluating one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 12. The method of claim 9 further comprising: causing one or more threads to wait for the execution of the shared thread to complete. 13. The method of claim 9 wherein the threads are microthreads comprising a plurality of microoperations. 14. The method of claim 13 further comprising: generating the microthreads responsive to decoding a plurality of macroinstructions. 15. The method of claim 13 further comprising: arranging the microthreads into the parallel execution groups based on instruction pointer values to induce microthread convergence. 16. The method of claim 9 further comprising: storing an execution mask having at least one value associated with each parallel execution lane; and enabling or disabling one or more of the parallel execution lanes based on the values associated with the lanes in the execution mask. 17. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of: arranging a plurality of threads into parallel execution groups for execution on a plurality of parallel execution lanes, the threads comprising operations to be executed by execution circuitry within each of the parallel execution lanes; dynamically evaluating one or more variables associated with the operations to determine if one or more of the operations are conditionally invariant, and the determination comprises determining whether the one or more of the operations will be invariant across threads of a plurality of parallel execution groups within a same lane but not across threads of the plurality of parallel execution groups in multiple lanes to produce a same variable value across the threads of the plurality of parallel execution groups; scheduling a shared thread upon a determination that a conditionally invariant operation will be invariant across the threads of the plurality of parallel execution groups; and executing the shared thread to generate execution results and to share the execution results across other threads of the plurality of parallel execution groups. 18. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: storing the execution results in a first set of registers in a first parallel execution lane; and broadcasting one or more of the execution results to additional sets of registers within the first parallel execution lane. 19. The non-transitory machine-readable medium of claim 17 wherein dynamically evaluating one or more variables comprises determining whether input values to the conditionally invariant operation will be identical across the threads of the plurality of parallel execution groups. 20. The non-transitory machine-readable medium of claim 17 further comprising program code to cause the machine to perform the operations of: causing one or more threads to wait for the execution of the shared thread to complete. 21.

Assignees

Intel Corp

Inventors

Classifications

G06F9/3888
controlled by a single instruction for multiple threads [SIMT] in parallel · CPC title
G06F9/3851Primary
from multiple instruction streams, e.g. multistreaming · CPC title
G06F9/4843
by program, e.g. task dispatcher, supervisor, operating system · CPC title
G06F9/3891
organised in groups of units sharing resources, e.g. clusters · CPC title
G06F9/3828
with global bypass, e.g. between pipelines, between clusters · CPC title

Patent family

Related publications grouped by family.

View patent family 65231026

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11093250B2 cover?: An apparatus and method for efficiently processing invariant operations on a parallel execution engine. For example, one embodiment of a processor comprises: a plurality of parallel execution lanes comprising execution circuitry and registers to concurrently execute a plurality of threads; front end circuitry coupled to the plurality of parallel execution lanes, the front end circuitry to arran…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).