Memory ordering in acceleration hardware
US-10572376-B2 · Feb 25, 2020 · US
US11029958B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11029958-B1 |
| Application number | US-201916729372-A |
| Country | US |
| Kind code | B1 |
| Filing date | Dec 28, 2019 |
| Priority date | Dec 28, 2019 |
| Publication date | Jun 8, 2021 |
| Grant date | Jun 8, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and apparatuses relating to configurable operand size operation circuitry in an operation configurable spatial accelerator are described. In one embodiment, a hardware accelerator includes a plurality of processing elements, a network between the plurality of processing elements to transfer values between the plurality of processing elements, and a first processing element of the plurality of processing elements including a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, configurable operand size operation circuitry coupled to the first plurality of input queues, and a configuration register within the first processing element to store a configuration value that causes the configurable operand size operation circuitry to switch to a first mode for a first multiple bit width from a plurality of selectable multiple bit widths of the configurable operand size operation circuitry, perform a selected operation on a plurality of first multiple bit width values from the first plurality of input queues in series to create a resultant value, and store the resultant value in the at least one first output queue.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a plurality of processing elements; a network between the plurality of processing elements to transfer values between the plurality of processing elements; and a first processing element of the plurality of processing elements comprising: a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, configurable operand size operation circuitry coupled to the first plurality of input queues, and a configuration register within the first processing element to store a configuration value that causes the configurable operand size operation circuitry to switch to a first mode for a first multiple bit width from a plurality of selectable multiple bit widths of the configurable operand size operation circuitry, perform a selected operation on a plurality of first multiple bit width values from the first plurality of input queues in series to create a resultant value, and store the resultant value in the at least one first output queue. 2. The apparatus of claim 1 , wherein the configurable operand size operation circuitry comprises a bit-serial adder circuit controlled by a counter that is set by the configuration value from the configuration register. 3. The apparatus of claim 1 , wherein the configurable operand size operation circuitry comprises a bit-serial multiplier circuit controlled by a counter that is set by the configuration value from the configuration register. 4. The apparatus of claim 1 , wherein the at least one first output queue is coupled via the network to a second processing element of the plurality of processing elements comprising: a third plurality of input queues having the multiple bit width coupled to the network, at least one fourth output queue having the multiple bit width coupled to the network, fixed operand size operation circuitry coupled to the first plurality of input queues, and a configuration register within the second processing element to store a second configuration value that causes the fixed operand size operation circuitry to perform a selected operation on the resultant value from the first processing element to create a second resultant value, and store the resultant value in the at least one fourth output queue. 5. The apparatus of claim 1 , wherein the first processing element comprises an input controller and an output controller, and, when the first plurality of input queues stores the plurality of first multiple bit width values, the input controller is to send a not empty value to the configurable operand size operation circuitry of the first processing element and when the at least one first output queue is not full, the output controller is to send a not full value to the configurable operand size operation circuitry of the first processing element, and the configurable operand size operation circuitry of the first processing element of the first processing element is to begin the selected operation on the plurality of first multiple bit width values stored in the first plurality of input queues after both the not empty value and the not full value are received. 6. The apparatus of claim 1 , wherein the first processing element comprises an output controller, and, when the at least one first output queue is not full, the output controller is to send a not full value to the configurable operand size operation circuitry of the first processing element to cause the first processing element to begin the selected operation on the plurality of first multiple bit width values. 7. The apparatus of claim 1 , wherein the first processing element comprises an input controller, and, when the first plurality of input queues stores the plurality of first multiple bit width values, the input controller is to send a not empty value to the configurable operand size operation circuitry of the first processing element to begin the selected operation on the plurality of first multiple bit width values. 8. The apparatus of claim 1 , further comprising a bitwise, row and column accessible register file coupled to the first processing element to store the resultant value from the at least one first output queue of the first processing element. 9. A method comprising: coupling a plurality of processing elements with a network to transfer values between the plurality of processing elements, wherein a first processing element of the plurality of processing elements comprises a first plurality of input queues having a multiple bit width coupled to the network, at least one first output queue having the multiple bit width coupled to the network, and configurable operand size operation circuitry coupled to the first plurality of input queues; storing a configuration value in a configuration register within the first processing element that causes the configurable operand size operation circuitry to switch to a first mode for a first multiple bit width from a plurality of selectable multiple bit widths of the configurable operand size operation circuitry; performing a selected operation, specified by the configuration value, with the configurable operand size operation circuitry on a plurality of first multiple bit width values from the first plurality of input queues in series to create a resultant value; and storing the resultant value in the at least one first output queue. 10. The method of claim 9 , wherein the performing the selected operation comprises controlling a bit-serial adder circuit of the configurable operand size operation circuitry by a counter that is set by the configuration value from the configuration register. 11. The method of claim 9 , wherein the performing the selected operation comprises controlling a bit-serial multiplier circuit of the configurable operand size operation circuitry by a counter that is set by the configuration value from the configuration register. 12. The method of claim 9 , further comprising: coupling the at least one first output queue via the network to a second processing element of the plurality of processing elements comprising a third plurality of input queues having the multiple bit width coupled to the network, at least one fourth output queue having the multiple bit width coupled to the network, and fixed operand size operation circuitry coupled to the first plurality of input queues; storing a second configuration value in a configuration register within the second processing element that causes the fixed operand size operation circuitry to perform a selected operation on the resultant value from the first processing element to create a second resultant value; and storing the resultant value in the at least one fourth output queue. 13. The method of claim 9 , wherein the first processing element comprises an input controller and an output controller, and, when the first plurality of input queues stores the plurality of first multiple bit width values, the input controller sends a not empty value to the configurable operand size operation circuitry of the first processing element and when the at least one first output queue is not full, the output controller sends a not full value to the configurable operand size operation circuitry of the first processing element, and the configurable operand size operation circuitry of the first processing element of the first processing element begins performing the selected operation on the plurality of first multiple bit width values stored in the first plurality of input queues after both the not empty value and the not full value are received. 14. The method
Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution · CPC title
according to execution mode, e.g. mode flag · CPC title
with variable precision · CPC title
Decoding the operand specifier, e.g. specifier format · CPC title
Arithmetic instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.