Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9405538B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9405538-B2 |
| Application number | US-201213730685-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2012 |
| Priority date | Dec 28, 2012 |
| Publication date | Aug 2, 2016 |
| Grant date | Aug 2, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus is described having a functional unit of an instruction execution pipeline. The functional unit has a plurality of compare-and-exchange circuits coupled to network circuitry to implement a vector sorting tree for a vector sorting instruction. Each of the compare-and-exchange circuits has a respective comparison circuit that compares a pair of inputs. Each of the compare-and-exchange circuits have a same sided first output for presenting a higher of the two inputs and a same sided second output for presenting a lower of the two inputs, said comparison circuit to also support said functional unit's execution of a prefix min and/or prefix add instruction.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a functional unit of an instruction execution pipeline comprising a plurality of compare-and-exchange circuits in parallel and coupled to network circuitry to execute a single vector sorting instruction to sort an input operand of one or more vectors into numerical order, each of said compare-and-exchange circuits having a respective comparison circuit that compares a pair of inputs, each of said compare-and-exchange circuits having a first output for presenting a higher of the pair of inputs and a second output for presenting a lower of the pair of inputs; and a memory circuit containing microcode that provides control signals to said network circuitry to loopback each output of the plurality of compare-and-exchange circuits into the plurality of compare-and-exchange circuits until each vector of an input operand of a plurality of vectors is sorted. 2. The apparatus of claim 1 wherein each vector of the input operand of the plurality of vectors is a same size. 3. The apparatus of claim 2 wherein said same size is specified with an immediate operand of said single vector sorting instruction. 4. The apparatus of claim 2 wherein said same size is any one of 2 elements, 4 elements, 8 elements, and 16 elements. 5. The apparatus of claim 1 wherein said functional unit simultaneously sorts each vector of the input operand of the plurality of vectors. 6. The apparatus of claim 1 wherein said network circuitry includes a configurable switching network. 7. The apparatus of claim 1 wherein the control signals comprise a number of cycles to complete the sort of the input operand. 8. An apparatus comprising: a functional unit of an instruction execution pipeline comprising a plurality of compare-and-exchange circuits in parallel and coupled to network circuitry to execute a single vector sorting instruction to sort an input operand of multiple vectors into numerical order, each of said compare-and-exchange circuits having a respective comparison circuit that compares a pair of inputs, each of said compare-and-exchange circuits having a first output for presenting a higher of the pair of inputs and a second output for presenting a lower of the pair of inputs; and a memory circuit containing microcode that provides control signals to said network circuitry to loopback each output of the plurality of compare-and-exchange circuits into the plurality of compare-and-exchange circuits until each vector of the input operand of multiple vectors is sorted. 9. The apparatus of claim 8 wherein each vector of the input operand of multiple vectors is a same size. 10. The apparatus of claim 9 wherein said same size is specified with an immediate operand of said single vector sorting instruction. 11. The apparatus of claim 9 wherein said same size is any one of 2 elements, 4 elements, 8 elements, and 16 elements. 12. The apparatus of claim 8 wherein said functional unit simultaneously sorts each vector of the input operand of multiple vectors. 13. The apparatus of claim 8 wherein said network circuitry includes a configurable switching network. 14. The apparatus of claim 8 wherein the control signals comprise a number of cycles to complete the sort of the input operand. 15. The apparatus of claim 8 further comprising at least one of: an adder to implement a prefix add instruction with said functional unit, and a multiplier to implement a prefix multiply instruction with said functional unit. 16. A method comprising: performing the following with functional unit of an instruction execution pipeline to execute a single vector sorting instruction to sort an input operand of one or more vectors into numerical order: passing elements of the input operand of one or more vectors to a plurality of compare-and-exchange circuits in parallel and coupled to network circuitry, wherein each of said plurality of compare-and-exchange circuits perform the following: compare a pair of said elements, present a higher of the pair of the elements on a first output, and present a lower of the pair of the elements on a second output; and providing control signals to said network circuitry to loopback each output of the plurality of compare-and-exchange circuits into the plurality of compare-and-exchange circuits until each vector of an input operand of a plurality of vectors is sorted. 17. The method of claim 16 wherein each vector of the input operand of the plurality of vectors is a same size. 18. The method of claim 16 further comprising executing at least one of a prefix sum instruction and a prefix add instruction with said functional unit. 19. The method of claim 16 further comprising executing at least one of a prefix min instruction and a prefix max instruction with said functional unit. 20. The method of claim 16 wherein said functional unit simultaneously sorts each vector of the input operand of the plurality of vectors. 21. The method of claim 16 wherein the control signals comprise a number of cycles to complete the sort of the input operand.
Sorting, i.e. extracting data from one or more carriers, rearranging the data in numerical or other ordered sequence, and rerecording the sorted data on the original carrier or on a different carrier or set of carriers {sorting methods in general}(G06F7/36 takes precedence) · CPC title
controlled in tandem, e.g. multiplier-accumulator · CPC title
Arithmetic instructions · CPC title
Compare instructions, e.g. Greater-Than, Equal-To, MINMAX · CPC title
Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.