Microprocessor with ALU integrated into load unit

US9501286B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9501286-B2
Application numberUS-60916909-A
CountryUS
Kind codeB2
Filing dateOct 30, 2009
Priority dateAug 7, 2009
Publication dateNov 22, 2016
Grant dateNov 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register.

First claim

Opening claim text (preview).

We claim: 1. A microprocessor, comprising: a register set, defined by an instruction set architecture of the microprocessor; a cache memory; a superscalar pipelined architecture including a plurality of execution units; and a load unit, coupled to the cache memory, wherein the load unit is distinct from the other execution units of the microprocessor and includes an arithmetic/logic unit (ALU) in which the load unit and the ALU are not separated by a register; wherein the load unit is configured to receive an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored; wherein the load unit is configured to read the source operand from the cache memory and to provide the source operand only to the ALU without registering the source operand before being provided to the ALU; wherein the ALU is configured to perform the operation on the source operand to generate the result; and wherein the load unit is further configured to output the result for subsequent retirement to the destination register. 2. The microprocessor of claim 1 , wherein the load unit is further configured to receive a second instruction that specifies a second memory address of a second source operand and a second destination register of the register set to which the second source operand is to be stored without performing an operation on the second source operand, wherein the load unit is configured to execute the first and second instructions in the same number of clock cycles. 3. The microprocessor of claim 1 , wherein the load unit further comprises: an address generator; and a bus, coupled to the address generator, the bus configured to forward the result of the instruction to an input of the address generator, wherein the address generator is configured to use the forwarded result to generate a memory address to enable the load unit to access the cache memory for a subsequent load instruction. 4. The microprocessor of claim 1 , wherein none of the other execution units are configured to read the cache memory. 5. The microprocessor of claim 1 , wherein at least one of the other execution units has an ALU configured to perform the operation specified by the instruction; however, the load unit does not forward the source operand to any of the at least one of the other execution units to perform the operation on the source operand to generate the result. 6. The microprocessor of claim 1 , wherein the load unit is configured to execute all instructions that read from the cache memory and the other execution units are configured to execute none of the instructions that read from the cache memory. 7. The microprocessor of claim 1 , wherein the instruction further specifies a second source operand, wherein the operation is to be performed on the first source operand and the second source operand to generate the result. 8. The microprocessor of claim 7 , wherein the second source operand is provided to the load unit by a register of the register set. 9. The microprocessor of claim 7 , further comprising: a storage element, configured to temporarily store the second source operand in response to the memory address of the first operand missing in the cache memory. 10. The microprocessor of claim 1 , wherein the load unit requires only two accesses to the register set to execute the instruction. 11. The microprocessor of claim 1 , further comprising: an instruction translator, configured to translate a macroinstruction into the instruction executed by the load unit, wherein the macroinstruction is defined by the instruction set architecture. 12. The microprocessor of claim 11 , wherein the instruction translator is further configured to translate a second macroinstruction defined by the instruction set architecture into a pair of instructions comprising the instruction as a first instruction and a second instruction, wherein the second instruction is executed by one of the other execution units that receives the result of the first instruction from the load unit and writes the result to the cache memory. 13. The microprocessor of claim 11 , wherein the instruction translator is configured to translate first and second macroinstructions defined by the instruction set architecture into the instruction. 14. The microprocessor of claim 11 , wherein the instruction set architecture of the microprocessor is an x86 architecture. 15. The microprocessor of claim 1 , wherein the operation comprises a zero-extend operation that zero-extends the source operand to a size of the destination register. 16. The microprocessor of claim 1 , wherein the operation comprises a Boolean NOT operation that inverts each bit of the source operand. 17. The microprocessor of claim 1 , wherein the operation comprises a NEGATE operation that generates a two's complement negation of the source operand. 18. The microprocessor of claim 1 , wherein the operation comprises an increment operation that increments the source operand. 19. The microprocessor of claim 1 , wherein the operation comprises a decrement operation that decrements the source operand. 20. The microprocessor of claim 1 , wherein the operation comprises a sign-extend operation that sign-extends the source operand. 21. The microprocessor of claim 1 , wherein the operation comprises a zero detect operation that generates the result as a true value if the source operand is zero and generates the result as a false value if the source operand is non-zero. 22. The microprocessor of claim 1 , wherein the operation comprises a ones detect operation that generates the result as a true value if all bits of the source operand are binary ‘1’ and generates the result as a false value otherwise. 23. The microprocessor of claim 1 , wherein the operation comprises a data format conversion operation that formats the source operand to a data format that is different from the data format in which the source operand was read from the data cache. 24. The microprocessor of claim 23 , wherein the instruction specifies the data format. 25. The microprocessor of claim 1 , wherein the operation comprises a Boolean logic operation, wherein the ALU performs the specified Boolean logic operation on the source operand and a second source operand to generate the result. 26. The microprocessor of claim 25 , wherein the Boolean logic operation comprises one of the following: AND, OR, XOR, NOR. 27. The microprocessor of claim 1 , wherein the operation comprises an arithmetic operation, wherein the ALU performs the specified arithmetic operation on the source operand and a second source operand to generate the result. 28. The microprocessor of claim 27 , wherein the arithmetic operation comprises one of the following: ADD, SUBTRACT, MULTIPLY. 29. A method for a microprocessor to process an instruction, comprising: providing the microprocessor with a superscalar pipelined architecture having a register set defined by an instruction set architecture of the microprocessor, a cache memory, execution units, and a load unit including an arithmetic/logic unit (ALU) which together are distinct from the other execution units of the microprocessor and wherein the load unit and the ALU are not separated by a register; rece

Assignees

Inventors

Classifications

  • controlled in tandem, e.g. multiplier-accumulator · CPC title

  • Instruction analysis, e.g. decoding, instruction word fields · CPC title

  • with dedicated cache, e.g. instruction or stack · CPC title

  • Arithmetic instructions · CPC title

  • Runtime instruction translation, e.g. macros · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9501286B2 cover?
A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source …
Who is the assignee on this patent?
Col Gerard M, Eddy Colin, Hooker Rodney E, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F9/3875. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).