Apparatus and method for a hybrid latency-throughput processor

US10664284B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10664284-B2
Application numberUS-201916289075-A
CountryUS
Kind codeB2
Filing dateFeb 28, 2019
Priority dateDec 28, 2012
Publication dateMay 26, 2020
Grant dateMay 26, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: a set of latency clusters to execute a main program code, the main program code comprising both latency program code and throughput program code, wherein a current point of execution in the main program code is identified by a primary instruction pointer; a set of throughput clusters comprising one or more processing elements to execute the throughput program code in the main program code; and wherein upon detecting the current point of execution for the main program code reaching a first throughput program code in the main program code, a front end unit of the set of throughput clusters is to distribute the first throughput program code to the one or more processing elements in the set of throughput clusters for execution. 2. The apparatus as in claim 1 , wherein upon detecting the current point of execution for the main program code reaching a first throughput program code in the main program code, an XCALL instruction is executed by the set of latency clusters to trigger the front end unit to distribute the first throughput program code. 3. The apparatus as in claim 2 , wherein the XCALL instruction is to identify a result register to store results of executing the XCALL instruction, a command register to store one or more commands from the first throughput program code to be executed, and a parameter register to store parameters for executing the one or more commands. 4. The apparatus as in claim 3 , wherein responsive to executing of the XCALL instruction, the front end unit is to retrieve a first command of the one or more commands from the command register and associated parameters from the parameter register and distribute the first command and the associated parameters for execution by the one or more processing elements. 5. The apparatus as in claim 1 , wherein one or more processing elements in the set of throughput clusters are capable of simultaneously multithreading by simultaneously executing multiple micro-threads. 6. The apparatus as in claim 5 , wherein each of the micro-threads includes a respective micro-instruction pointer used by the processing elements to maintain a current point of micro-thread execution. 7. The apparatus as in claim 5 , wherein one or more processing elements in the set of throughput clusters are homogeneous processing elements, each capable of executing any one of the micro-threads. 8. The apparatus as in claim 5 , wherein one or more processing elements in the set of throughput clusters are heterogeneous processing elements, each designed to execute specific types of micro-threads. 9. A method comprising: executing a main program code on a set of latency clusters, the main program code comprising both latency program code and throughput program code, wherein a current point of execution in the main program code is identified by a primary instruction pointer; and detecting the current point of execution for the main program code reaching a first throughput program code in the main program code and responsively distributing the first throughput program code to one or more processing elements in a set of throughput clusters for execution. 10. The method as in claim 9 , wherein responsively distributing the first throughput program code to one or more processing elements in the set of throughput clusters for execution further comprises executing an XCALL instruction by the set of latency clusters. 11. The method as in claim 10 , wherein the XCALL instruction is to identify a result register to store results of executing of the XCALL instruction, a command register to store one or more commands from the first throughput program code to be executed, and a parameter register to store parameters for executing the one or more commands. 12. The method as in claim 11 , wherein executing the XCALL instruction further comprises: retrieving a first command of the one or more commands from the command register and associated parameters from the parameter register; and distributing the first command and the associated parameters for execution by the one or more processing elements of the set of throughput clusters. 13. The method as in claim 9 , wherein one or more processing elements in the set of throughput clusters are capable of simultaneously multithreading by simultaneously executing multiple micro-threads. 14. The method as in claim 13 , wherein each of the micro-threads includes a respective micro-instruction pointer used by the processing elements to maintain a current point of micro-thread execution. 15. The method as in claim 13 , wherein one or more processing elements in the set of throughput clusters are homogeneous processing elements, each capable of executing any one of the micro-threads. 16. The method as in claim 13 , wherein one or more processing elements in the set of throughput clusters are heterogeneous processing elements, each designed to execute specific types of micro-threads.

Assignees

Inventors

Classifications

  • using a secondary processor, e.g. coprocessor (peripheral processor G06F13/12) · CPC title

  • Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title

  • with reconfigurable architecture · CPC title

  • Recovery, e.g. branch miss-prediction, exception handling (error detection or correction G06F11/00) · CPC title

  • Reconfigurable logic embedded in CPU, e.g. reconfigurable unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10664284B2 cover?
An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3851. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 26 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).