Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US12554490B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12554490-B2 |
| Application number | US-202318543036-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 18, 2023 |
| Priority date | Dec 20, 2022 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.
Opening claim text (preview).
The invention claimed is: 1 . An execution unit configured to execute a computer program instruction to perform a byte-wise rotation operation of an input data block, the execution unit comprising: a logic circuit comprising: an input data array to receive an input data block comprising N bytes; two first layer multiplexer arrays, each first layer multiplexer array configured to: receive a first layer data block comprising a respective subset of bytes of the input data block; receive a first layer control signal; rotate the first layer data block by an amount indicated by the first layer control signal; the two first layer multiplexer arrays being configured to respectively output a first rotated first layer data block and a second rotated first layer data block; a second layer multiplexer array configured to receive a second control signal, the second layer multiplexer array comprising N multiplexers, each multiplexer configured to select between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal to output a rotated second layer data block, and a control signal generator, configured to generate the first layer control signal and second layer control signal based on the received computer program instruction. 2 . The execution unit of claim 1 , wherein each first layer data block comprises N/2 bytes. 3 . The execution unit of claim 1 , wherein: the input data array is configured to receive an input data block comprising M bytes, where M>N; the logic circuit comprises: four first layer multiplexer arrays, so as to output first to fourth rotated first layer blocks; two second layer multiplexer arrays, a first of the second layer multiplexer arrays configured to receive the first and second rotated first layer blocks and output a first rotated second layer data block, a second of the second layer multiplexer arrays configured to receive third and fourth rotated first layer blocks and output a second rotated second layer block; a third layer multiplexer array configured to receive a third layer control signal, the third layer multiplexer array comprising M multiplexers configured to select between a corresponding byte of the first and second rotated second layer data blocks based on the third control signal to output a rotated third layer data block; and the control signal generator is configured to generate the third layer control signal. 4 . The execution unit of claim 1 , wherein the logic circuit comprises an intermediate results array configured to receive the output of the first layer multiplexer arrays. 5 . The execution unit of claim 1 , wherein each first layer multiplexer array comprises a plurality of S:1 multiplexers. 6 . The execution unit of claim 1 , wherein the second layer control signal comprises a bitmask of N bits, each bit in the bitmask acting as a control signal for a respective one of the N multiplexers of the second multiplexer array. 7 . The execution unit of claim 6 , wherein the logic circuit comprises a splitter to split the bitmask and supply the respective bits to the respective multiplexers. 8 . The execution unit of claim 6 , wherein the control signal generator is configured to rotate the bitmask, wherein an amount of rotation of the bitmask results in output of the rotated second layer data block rotated by the same amount. 9 . The execution unit of claim 8 , wherein the control signal generator is configured to rotate the bitmask by selecting a stored rotated bitmask from a lookup table. 10 . The execution unit of claim 3 , wherein the third layer control signal is a bitmask of M bits. 11 . The execution unit of claim 1 , wherein the control signal generator is configured to generate a second layer control signal that causes the second layer multiplexer array to act as a passthrough. 12 . The execution unit of any claim 1 , wherein the logic circuit comprises: a plurality of data path lanes; one or more clock gates configured to disable one or more of the data path lanes, and the control signal generator is configured to generate a clock gate control signal to control the one or more clock gates. 13 . The execution unit of claim 1 , wherein the logic circuit comprises a pipeline register disposed between the first layer multiplexer arrays and second layer multiplexer array. 14 . The execution unit of claim 1 , wherein the computer program instruction is a rotate instruction, configured to rotate the input data block. 15 . The execution unit of claim 1 , wherein the computer program instruction comprises a plurality of operations, wherein the rotate operation is one of the plurality of operations. 16 . The execution unit of claim 15 , wherein the computer program instruction comprises: a pack instruction, configured to copy a sequence of consecutive bytes from a first position in a first data block into a second location in a second data block, or an extract instruction, configured to extract a sequence of consecutive bytes from a concatenation of a first data block and a second data block. 17 . The execution unit of claim 1 , wherein the execution unit is configured generate the first layer control signal, and second layer control signal based on values indicated by the computer program instruction. 18 . A processing device comprising a plurality of processing units, wherein at least one of the processing units comprises: an execution unit configured to execute a computer program instruction to perform a byte-wise rotation operation of an input data block, the execution unit comprising: a logic circuit comprising: an input data array to receive an input data block comprising N bytes; two first layer multiplexer arrays, each first layer multiplexer array configured to: receive a first layer data block comprising a respective subset of bytes of the input data block; receive a first layer control signal; rotate the first layer data block by an amount indicated by the first layer control signal; the two first layer multiplexer arrays being configured to respectively output a first rotated first layer data block and a second rotated first layer data block; a second layer multiplexer array configured to receive a second control signal, the second layer multiplexer array comprising N multiplexers, each multiplexer configured to select between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal to output a rotated second layer data block, and a control signal generator, configured to generate the first layer control signal and second layer control signal based on the received computer program instruction. 19 . The processing device of claim 18 , wherein the processing units are tile processors that communicate via an exchange fabric which implements a time deterministic exchange. 20 . A method implemented in an execution unit, the method comprising: receiving an input data block comprising N bytes; supplying first layer data blocks comprising a respective subset of bytes of the input data block to a first and second first layer multiplexer array; supplying a first layer control signal to the first and second first layer multiplexer array; rotating the first layer data blocks by an amount indicated by the control signal to output a first rotated first layer data block and a second rotated first layer data block; supplying the first rotated first layer data block and the second rotated
Clock generators with changeable or programmable clock frequency · CPC title
using a mask · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.