Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9910670B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9910670-B2 |
| Application number | US-201414327534-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 9, 2014 |
| Priority date | Jul 9, 2014 |
| Publication date | Mar 6, 2018 |
| Grant date | Mar 6, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A processor is described having an instruction execution pipeline. The instruction execution pipeline includes an instruction fetch stage to fetch an instruction. The instruction format of the instruction specifies a first input vector, a second input vector and a third input operand. The instruction execution pipeline comprises an instruction decode stage to decode the instruction. The instruction execution pipeline includes a functional unit to execute the instruction. The functional unit includes a routing network to route a first contiguous group of elements from a first end of one of the input vectors to a second end of the instruction's resultant vector, and, route a second contiguous group of elements from a second end of the other of the input vectors to a first end of the instruction's resultant vector. The first and second ends are opposite vector ends. The first and second groups of contiguous elements are defined from the third input operand. The instruction is not capable of routing non-contiguous groups of elements from the input vectors to the instruction's resultant vector. A software pipeline that uses the instruction is also described.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: an instruction execution pipeline comprising: an instruction fetch stage to fetch a first instruction and a second instruction, an instruction format of the first instruction specifying a first register storing a first input vector that is misaligned with respect to memory addressing space, a second register storing a second, next input vector that is misaligned with respect to the memory addressing space, a first resultant vector, and a third input operand, and an instruction format of the second instruction specifying the second register storing the second, next input vector that is misaligned with respect to the memory addressing space, a third register storing a third, next input vector that is misaligned with respect to the memory addressing space, a second resultant vector, and a fourth input operand; an instruction decoder to decode said first instruction into a decoded first instruction, and decode said second instruction into a decoded second instruction; and an execution unit to: execute the decoded first instruction to cause a routing network to route a first contiguous group of elements from a first end of the first input vector to a second end of said first resultant vector, route a second contiguous group of elements from a second end of the second, next input vector to a first end of said first resultant vector, said first end of the first input vector and said second end of the second, next input vector being opposite vector ends, and preserve the second, next input vector in the second register after execution of the decoded first instruction, wherein said first and second contiguous groups of elements of the first input vector and the second, next input vector are defined from said third input operand, and execute the decoded second instruction to cause the routing network to route a first contiguous group of elements from a first end of the second, next input vector to a second end of said second resultant vector, route a second contiguous group of elements from a second end of the third, next input vector to a first end of said second resultant vector, said first end of the second, next input vector and said second end of the third, next input vector being opposite vector ends, wherein said first and second contiguous groups of elements of the second, next input vector and the third, next input vector are defined from said fourth input operand. 2. The processor of claim 1 , wherein said third input operand is specified as a scalar. 3. The processor of claim 1 , wherein said third input operand is embodied with a mask vector. 4. The processor of claim 1 , wherein said first end of the first input vector is a left end and said second end of the second, next input vector is a right end. 5. The processor of claim 1 , wherein said first end of the first input vector is a right end and said second end of the second, next input vector is a left end. 6. The processor of claim 1 , wherein the execution unit is to execute the decoded second instruction to further preserve the third, next input vector in the third register after execution of the decoded second instruction. 7. The processor of claim 1 , wherein the first resultant vector is stored as a resultant of the first instruction in a register that is not the first register and not the second register. 8. A non-transitory machine readable medium containing program code stored therein that when processed by a computing system causes a method to be performed, said method comprising: detecting processing of an array having misaligned data rows; compiling said processing of said array into a software pipelined loop a program code sequence having a first instruction and a second instruction, an instruction format of the first instruction specifying a first register storing a first input vector of the array that is misaligned with respect to memory addressing space, a second register storing a second, next input vector of the array that is misaligned with respect to the memory addressing space, a first resultant vector, and a third input operand, and an instruction format of the second instruction specifying the second register storing the second, next input vector of the array that is misaligned with respect to the memory addressing space, a third register storing a third, next input vector of the array that is misaligned with respect to the memory addressing space, a second resultant vector, and a fourth input operand; decoding said first instruction into a decoded first instruction; decoding said second instruction into a decoded second instruction; executing the decoded first instruction to cause a routing network to route a first contiguous group of elements from a first end of the first input vector to a second end of said first resultant vector, route a second contiguous group of elements from a second end of the second, next input vector to a first end of said first resultant vector, said first end of the first input vector and said second end of the second, next input vector being opposite vector ends, and preserve the second, next input vector in the second register after execution of the decoded first instruction, wherein said first and second contiguous groups of elements of the first input vector and the second, next input vector are defined from said third input operand; and executing the decoded second instruction to cause the routing network to route a first contiguous group of elements from a first end of the second, next input vector to a second end of said second resultant vector, route a second contiguous group of elements from a second end of the third, next input vector to a first end of said second resultant vector, said first end of the second, next input vector and said second end of the third, next input vector being opposite vector ends, wherein said first and second contiguous groups of elements of the second, next input vector and the third, next input vector are defined from said fourth input operand. 9. The non-transitory machine readable medium of claim 8 , wherein peeling is not used in formulating said program code sequence. 10. The non-transitory machine readable medium of claim 8 , wherein said first resultant vector is an aligned row of said array. 11. The non-transitory machine readable medium of claim 10 , wherein said program code sequence includes code to process said aligned row. 12. The non-transitory machine readable medium of claim 10 , wherein said first resultant vector includes sections of two different rows of said array. 13. The non-transitory machine readable medium of claim 10 , wherein the executing of the decoded second instruction further comprises preserving the third, next input vector in the third register after execution of the decoded second instruction. 14. The non-transitory machine readable medium of claim 13 , wherein memory accesses for the rows are not aligned with row boundaries. 15. The non-transitory machine readable medium of claim 10 , wherein the first resultant vector is stored as a resultant of the first instruction in a register that is not the first register and not the second register. 16. The non-transitory machine readable medium of claim 15 , wherein memory accesses for the rows are not aligned with row boundaries. 17. A computing system comprising: a system memory; a processor coupled to said system memory, said processor comprising an instruction execution pipeline comprising: an instruction fetch stage to fetch a first instruction and a second instruction, an instruction format of the firs
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
using a mask · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.