Processing core having shared front end unit

US10140129B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10140129-B2
Application numberUS-201213730719-A
CountryUS
Kind codeB2
Filing dateDec 28, 2012
Priority dateDec 28, 2012
Publication dateNov 27, 2018
Grant dateNov 27, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads. Each of the plurality of processing units also comprises: i) at least one set of functional units corresponding to a complete instruction set offered by the processor, the at least one set of functional units to execute its respective processing unit's received microcode; ii) registers coupled to the at least one set of functional units to store operands and resultants of the received microcode; iii) data fetch circuitry to fetch input operands for the at least one functional units' execution of the received microcode.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor having one or more processing cores, each of said one or more processing cores comprising: a front end unit to fetch respective instructions of threads and decode said instructions into respective decoded instructions and input operand and resultant addresses of said instructions; and a plurality of processing units, each of said processing units to be assigned a plurality of said threads, each processing unit coupled to said front end unit and having a respective buffer to receive and store decoded instructions of its assigned plurality of said threads, each of said plurality of processing units comprising: i) a plurality of functional units comprising at least one integer functional unit and at least one floating point functional unit, said plurality of functional units to simultaneously execute its respective processing unit's received, decoded instructions for two or more of its assigned plurality of said threads, ii) registers coupled to said plurality of functional units to store operands and resultants of said received, decoded instructions of its assigned plurality of said threads, iii) data fetch circuitry to fetch input data operands for said plurality of functional units' execution of said received, decoded instructions of its assigned plurality of said threads, and iv) register allocation circuitry to allocate a respective register partition of the registers for each assigned thread of its assigned plurality of said threads. 2. The processor of claim 1 wherein said plurality of functional units are not coupled to any logic circuitry to perform out-of-order execution of said received, decoded instructions. 3. The processor of claim 1 wherein the register allocation circuitry of each of the plurality of processing units is to allocate the respective register partition of the registers for each assigned thread of its assigned plurality of said threads that are to be concurrently executed. 4. The processor of claim 1 wherein said plurality of functional units are not coupled to any logic circuitry to perform speculative execution of said received, decoded instructions. 5. The processor of claim 4 wherein the register allocation circuitry of each of the plurality of processing units is to allocate the respective register partition of the registers for each assigned thread of its assigned plurality of said threads that are to be concurrently executed. 6. The processor of claim 1 wherein said processor does not include circuitry for any of said threads to issue instructions in parallel for any one of said threads. 7. The processor of claim 1 wherein each of the plurality of processing units further comprise register allocation circuitry to allocate a register partition of less than all of the registers for each assigned thread. 8. A method performed by a processor comprising: fetching respective instructions of threads with a front end unit of the processor; decoding said instructions into respective decoded instructions and input operand and resultant addresses of said instructions with the front end unit of the processor; assigning a plurality of said threads to each of a plurality of processing units of a processing core of the processor, each processing unit coupled to said front end unit and having a respective buffer to receive and store decoded instructions of its assigned plurality of said threads; simultaneously executing each respective processing unit's received, decoded instructions for two or more of its assigned plurality of threads with a plurality of functional units of each respective processing unit, the plurality of functional units comprising at least one integer functional unit and at least one floating point functional unit; storing operands and resultants of said received, decoded instructions of its assigned plurality of said threads in registers coupled to said plurality of functional units; fetching input data operands with data fetch circuitry of each respective processing unit for said plurality of functional units' execution of said received, decoded instructions of its assigned plurality of said threads; and allocating a respective register partition of the registers, with register allocation circuitry of each processing unit, for each assigned thread of its assigned plurality of said threads. 9. The method of claim 8 further comprising, at each processing unit performing the following: allocating the respective register partition of the registers for each assigned thread of its assigned plurality of said threads that are to be concurrently executed. 10. The method of claim 8 wherein software assigns a first thread to a first of the plurality of processing units and a second thread to a second of the plurality of processing units. 11. The method of claim 10 wherein said first and second threads are not processed with any speculative execution logic circuitry. 12. The method of claim 10 wherein said first and second threads are not processed with any out-of-order execution logic circuitry. 13. The method of claim 10 wherein said first and second threads do not issue their respective instructions in parallel. 14. A processor comprising: at least two processing cores each having: a front end unit to fetch respective instructions of threads to be processed by its processing core and decode said instructions into respective decoded instructions and input operand and resultant addresses of said instructions; said front end unit coupled to a plurality of processing units of its processing core, each of said plurality of processing units to be assigned a plurality of said threads, each processing unit coupled to said front end unit and having a respective buffer to receive and store decoded instructions and each processing unit to receive input operand and resultant addresses of its assigned plurality of said threads from the front end unit, each of said plurality of processing units comprising: i) a plurality of functional units comprising at least one integer functional unit and at least one floating point functional unit, said plurality of functional units to simultaneously execute its respective processing unit's received, decoded instructions for two or more of its assigned plurality of said threads, ii) registers coupled to said plurality of functional units to store operands and resultants of said received, decoded instructions of its assigned plurality of said threads, iii) data fetch circuitry to fetch input operands for said plurality of functional units' execution of said received, decoded instructions of its assigned plurality of said threads, and iv) register allocation circuitry to allocate a respective register partition of the registers for each assigned thread of its assigned plurality of said threads; an interconnection network coupled to said plurality of processing units; and a cache coupled to said interconnection network. 15. The processor of claim 14 wherein said plurality of functional units are not coupled to any logic circuitry to perform out-of-order execution of said received, decoded instructions. 16. The processor of claim 15 wherein the register allocation circuitry of each of the plurality of processing units is to allocate the respective register partition of the registers for each assigned thread of its assigned plurality of said threads that are to be concurrently executed. 17. The processor of claim 14 wherein said plurality of functional units are not coupled to any logic circuitry to perform speculative execution of said received, decoded instr

Assignees

Inventors

Classifications

  • according to context, e.g. thread buffers · CPC title

  • from multiple instruction streams, e.g. multistreaming · CPC title

  • Decoding for concurrent execution · CPC title

  • G06F9/3891Primary

    organised in groups of units sharing resources, e.g. clusters · CPC title

  • Instruction prefetching · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10140129B2 cover?
A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of process…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06F9/3891. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 27 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).