Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9766886B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9766886-B2 |
| Application number | US-201113977736-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 16, 2011 |
| Priority date | Dec 16, 2011 |
| Publication date | Sep 19, 2017 |
| Grant date | Sep 19, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Instructions and logic provide vector linear interpolation functionality. In some embodiments, responsive to an instruction specifying: a first operand from a set of vector registers, a size of each of the vector elements, a portion of the vector elements upon which to compute linear interpolations, a second operand from a set of vector registers, and a third operand; an execution unit, reads a first, a second and a third value of the size of vector elements from corresponding data fields in the first, the second and the third operand respectively and computes an interpolated value as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: one or more vector registers each comprising a plurality of data fields to store values of vector elements; a decode stage to decode a first instruction specifying: a first operand specifying a first vector register of the one or more vector registers, a second operand specifying a second vector register of the one or more vector registers, a third operand, and a size of the vector elements associated with the one or more vector registers; and an execution unit, responsive to the decoded first instruction and for each data field of the plurality of data fields, to: read a first, a second and a third value respectively from corresponding data fields of the vector registers, based on the size of the vector elements, in the first, the second and the third operand; compute a first interpolated value for the vector elements as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value; read a fourth, a fifth and a sixth value respectively from corresponding data fields of the vector registers, based on the size value of the vector elements, in the first, the second and the third operand; and compute a second interpolated value for the vector elements, in parallel with computing the first value multiplied, as the fourth value multiplied by the fifth value minus the fifth value multiplied by the sixth value plus the sixth value. 2. The processor of claim 1 , wherein the execution unit, responsive to the decoded first instruction and for each data field of the plurality of data fields, is further to store each computed interpolated value respectively in corresponding data fields of the size of vector elements in a destination operand specified by the first instruction. 3. The processor of claim 1 , wherein the specified third operand is one of the one or more vector registers. 4. The processor of claim 1 , wherein the specified third operand is a memory location. 5. The processor of claim 1 , the first instruction specifying a mask identifying the plurality of data fields, wherein said values read from data fields in source operands corresponds to vector elements in the source operands unmasked by the mask specified by the first instruction. 6. The processor of claim 1 , wherein the size of vector elements is 64-bits. 7. The processor of claim 1 , wherein the size of vector elements is 32-bits. 8. The processor of claim 1 , wherein the size of vector elements is 16-bits. 9. A machine implemented method comprising: reading, by a processor, a first, a second and a third value, respectively from corresponding data fields of a vector element, based on a size in a first, a second and a third operand specified by a first executable instruction; computing, by the processor, a first interpolated value for vector elements as the first value multiplied by the second value minus the second value multiplied by the third value plus the third value; reading, by the processor, a fourth, a fifth and a sixth value respectively from corresponding data fields of the vector element, based on the size value of the vector element, in the first, the second and the third operand; and computing, by the processor, a second interpolated value for the vector elements, in parallel with computing the first value multiplied, as the fourth value multiplied by the fifth value minus the fifth value multiplied by the sixth value plus the sixth value. 10. The machine implemented method of claim 9 , wherein the vector element size is 64-bits. 11. The machine implemented method of claim 9 , wherein the vector element size is 32-bits. 12. The machine implemented method of claim 9 , wherein the size of vector elements is 16-bits. 13. The machine implemented method of claim 9 , wherein the specified third operand is one of a set of vector registers. 14. The machine implemented method of claim 9 , wherein the specified third operand is a memory location. 15. The machine implemented method of claim 9 , the first executable instruction specifying a mask identifying a portion of the data fields of the vector element size, wherein said values read from corresponding data fields in the first, second and third operands corresponds to vector elements in the first, second and third operands unmasked by the mask specified by the first executable instruction. 16. An apparatus comprising: a register/memory access logic circuit to read a first, a second and a third value, respectively from corresponding data fields of a vector element, based on a size of the vector element, in a first, a second and a third operand specified by a first executable instruction; and a partitionable multiplication/addition logic to: compute a first interpolated value for the vector element as the first value multiplied by the second value to generate a first set of partial products and the second value multiplied by the third value to generate a second set of negated partial products, then adding together the first set of partial products plus the second set of negated partial products plus the third value; read a fourth, a fifth and a sixth value respectively from corresponding data fields of the vector element, based on the size value of the vector element, in the first, the second and the third operand; and compute a second interpolated value for the vector element, in parallel with computing the first value multiplied, as the fourth value multiplied by the fifth value minus the fifth value multiplied by the sixth value plus the sixth value. 17. The apparatus of claim 16 , wherein said partitionable multiplication/addition logic to store the first interpolated value to a corresponding data field of the size of vector elements in a destination operand specified by the first executable instruction. 18. The apparatus of claim 17 , wherein the destination operand specified by the first executable instruction is a vector register different than the first, the second and the third operand specified by the first executable instruction. 19. The apparatus of claim 17 , wherein corresponding data fields being read for the third and sixth values were converted up to the vector element size specified by the first executable instruction. 20. The apparatus of claim 19 , wherein corresponding data fields being read for the third and sixth values were broadcast from the third operand specified by the first executable instruction. 21. The apparatus of claim 19 , wherein corresponding data fields being read for the third and sixth values were swizzled from the third operand specified by the first executable instruction. 22. A processing system comprising: a memory; and a first plurality of processors, each of the first plurality of processors comprising: one or more vector registers each comprising a plurality of data fields to store values of vector elements; a decode stage to decode a first instruction specifying: a first operand specifying a first vector register of the one or more vector registers, a second operand specifying a second vector register of the one or more vector registers, a third operand, and a size of the vector elements associated with the one or more vector registers; a register/memory read access stage to read a first, a second and a third value, respectively from corresponding data fields of the vector registers, based on the size of the vector elements, in the first, the second and the third operand; and an execution stage t
Function evaluation by approximation methods, e.g. inter- or extrapolation, smoothing, least mean square method ({G06F17/18 takes precedence } ; interpolation for numerical control G05B19/18) · CPC title
having multiple operands in a single register · CPC title
Vector processors · CPC title
Arithmetic instructions · CPC title
with variable precision · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.