Rotating data blocks

US12554490B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12554490-B2
Application numberUS-202318543036-A
CountryUS
Kind codeB2
Filing dateDec 18, 2023
Priority dateDec 20, 2022
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The second layer multiplexer array receives a second control signal and selects between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal. The execution unit also includes a control signal generator, configured to generate the first layer control signal and second layer control signal based on a received computer program instruction. Results of smaller block rotations are thus used as partial results for larger block rotation, avoiding large multiplexer arrays with complex wiring.

First claim

Opening claim text (preview).

The invention claimed is: 1 . An execution unit configured to execute a computer program instruction to perform a byte-wise rotation operation of an input data block, the execution unit comprising: a logic circuit comprising: an input data array to receive an input data block comprising N bytes; two first layer multiplexer arrays, each first layer multiplexer array configured to: receive a first layer data block comprising a respective subset of bytes of the input data block; receive a first layer control signal; rotate the first layer data block by an amount indicated by the first layer control signal; the two first layer multiplexer arrays being configured to respectively output a first rotated first layer data block and a second rotated first layer data block; a second layer multiplexer array configured to receive a second control signal, the second layer multiplexer array comprising N multiplexers, each multiplexer configured to select between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal to output a rotated second layer data block, and a control signal generator, configured to generate the first layer control signal and second layer control signal based on the received computer program instruction. 2 . The execution unit of claim 1 , wherein each first layer data block comprises N/2 bytes. 3 . The execution unit of claim 1 , wherein: the input data array is configured to receive an input data block comprising M bytes, where M>N; the logic circuit comprises: four first layer multiplexer arrays, so as to output first to fourth rotated first layer blocks; two second layer multiplexer arrays, a first of the second layer multiplexer arrays configured to receive the first and second rotated first layer blocks and output a first rotated second layer data block, a second of the second layer multiplexer arrays configured to receive third and fourth rotated first layer blocks and output a second rotated second layer block; a third layer multiplexer array configured to receive a third layer control signal, the third layer multiplexer array comprising M multiplexers configured to select between a corresponding byte of the first and second rotated second layer data blocks based on the third control signal to output a rotated third layer data block; and the control signal generator is configured to generate the third layer control signal. 4 . The execution unit of claim 1 , wherein the logic circuit comprises an intermediate results array configured to receive the output of the first layer multiplexer arrays. 5 . The execution unit of claim 1 , wherein each first layer multiplexer array comprises a plurality of S:1 multiplexers. 6 . The execution unit of claim 1 , wherein the second layer control signal comprises a bitmask of N bits, each bit in the bitmask acting as a control signal for a respective one of the N multiplexers of the second multiplexer array. 7 . The execution unit of claim 6 , wherein the logic circuit comprises a splitter to split the bitmask and supply the respective bits to the respective multiplexers. 8 . The execution unit of claim 6 , wherein the control signal generator is configured to rotate the bitmask, wherein an amount of rotation of the bitmask results in output of the rotated second layer data block rotated by the same amount. 9 . The execution unit of claim 8 , wherein the control signal generator is configured to rotate the bitmask by selecting a stored rotated bitmask from a lookup table. 10 . The execution unit of claim 3 , wherein the third layer control signal is a bitmask of M bits. 11 . The execution unit of claim 1 , wherein the control signal generator is configured to generate a second layer control signal that causes the second layer multiplexer array to act as a passthrough. 12 . The execution unit of any claim 1 , wherein the logic circuit comprises: a plurality of data path lanes; one or more clock gates configured to disable one or more of the data path lanes, and the control signal generator is configured to generate a clock gate control signal to control the one or more clock gates. 13 . The execution unit of claim 1 , wherein the logic circuit comprises a pipeline register disposed between the first layer multiplexer arrays and second layer multiplexer array. 14 . The execution unit of claim 1 , wherein the computer program instruction is a rotate instruction, configured to rotate the input data block. 15 . The execution unit of claim 1 , wherein the computer program instruction comprises a plurality of operations, wherein the rotate operation is one of the plurality of operations. 16 . The execution unit of claim 15 , wherein the computer program instruction comprises: a pack instruction, configured to copy a sequence of consecutive bytes from a first position in a first data block into a second location in a second data block, or an extract instruction, configured to extract a sequence of consecutive bytes from a concatenation of a first data block and a second data block. 17 . The execution unit of claim 1 , wherein the execution unit is configured generate the first layer control signal, and second layer control signal based on values indicated by the computer program instruction. 18 . A processing device comprising a plurality of processing units, wherein at least one of the processing units comprises: an execution unit configured to execute a computer program instruction to perform a byte-wise rotation operation of an input data block, the execution unit comprising: a logic circuit comprising: an input data array to receive an input data block comprising N bytes; two first layer multiplexer arrays, each first layer multiplexer array configured to: receive a first layer data block comprising a respective subset of bytes of the input data block; receive a first layer control signal; rotate the first layer data block by an amount indicated by the first layer control signal; the two first layer multiplexer arrays being configured to respectively output a first rotated first layer data block and a second rotated first layer data block; a second layer multiplexer array configured to receive a second control signal, the second layer multiplexer array comprising N multiplexers, each multiplexer configured to select between a corresponding byte of the first and second rotated first layer data blocks based on the second control signal to output a rotated second layer data block, and a control signal generator, configured to generate the first layer control signal and second layer control signal based on the received computer program instruction. 19 . The processing device of claim 18 , wherein the processing units are tile processors that communicate via an exchange fabric which implements a time deterministic exchange. 20 . A method implemented in an execution unit, the method comprising: receiving an input data block comprising N bytes; supplying first layer data blocks comprising a respective subset of bytes of the input data block to a first and second first layer multiplexer array; supplying a first layer control signal to the first and second first layer multiplexer array; rotating the first layer data blocks by an amount indicated by the control signal to output a first rotated first layer data block and a second rotated first layer data block; supplying the first rotated first layer data block and the second rotated

Assignees

Inventors

Classifications

  • Clock generators with changeable or programmable clock frequency · CPC title

  • using a mask · CPC title

  • Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12554490B2 cover?
An execution unit performs a byte-wise rotation of an input data block. An input data array receives an input data block. Two first layer multiplexer arrays each receive a first layer data block comprising a respective subset of bytes of the input data block and a first layer control signal, and rotate the first layer data block by an amount indicated by the first layer control signal. The seco…
Who is the assignee on this patent?
Graphcore Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/30038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).