Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9785436B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9785436-B2 |
| Application number | US-201213631071-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2012 |
| Priority date | Sep 28, 2012 |
| Publication date | Oct 10, 2017 |
| Grant date | Oct 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
Opening claim text (preview).
What is claimed is: 1. A processor comprising: a non-transitory machine-readable medium including instructions, which when read by the machine-readable medium, causes the processor to: execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to compute a gather state to be used by subsequent gather operations, wherein the computation of the gather state by the gather setup operations is to include to determine one or more addresses of vector data elements to be gathered by the gather operations, and wherein the gather state comprises a non-architectural processor state stored within one or more internal processor storage locations; and execute the one or more gather operations to gather vector data elements using the gather state computed by the gather setup operations. 2. The processor as in claim 1 wherein the gather setup operations comprise gather setup instructions and wherein the gather operations comprise gather instructions. 3. The processor as in claim 2 further comprising: a decoder to decode the gather setup instructions and gather instructions; and an execution unit to execute the gather setup instructions and gather instructions, wherein the gather setup instructions calculate addresses of the vector data elements to be gathered when executed by the execution unit and wherein the addresses are provided to the decoder for use by the gather instructions during decoding. 4. The processor as in claim 2 further comprising: an instruction fetch unit to fetch the gather setup instructions and the gather instructions from a memory. 5. The processor as in claim 1 further comprising: an index register to store an index value for each of the vector data elements to be gathered; and a base address register to store a base address for the vector data elements, wherein the addresses of the vector data elements to be gathered is to be determined by adding the index value for each vector data element to the base address. 6. The processor as in claim 5 further comprising: a mask register to store a mask bit associated with each of the vector data elements, wherein a first mask bit value indicates that the vector data element associated therewith will be gathered and a second mask bit value indicates that the vector data element associated therewith will not be gathered. 7. The processor as in claim 6 wherein the gather setup operations or the gather operations reset each mask bit from the first mask bit value to the second mask bit value upon generating an address for the vector data element associated with each respective mask bit. 8. The processor as in claim 1 wherein a prior gather operation determines an address of a vector data element to be gathered by a subsequent gather operation, the processor is to execute the subsequent gather operation using the address determined by the prior gather operation. 9. The processor as in claim 1 wherein first and second gather setup operations are executed prior to executing a first gather operation. 10. The processor as in claim 9 wherein the first gather operation uses an address determined by the first gather setup operation and a second gather operation uses an address determined by the second gather setup operation. 11. The processor as in claim 1 wherein the gather state comprises a non-architectural processor state stored within one or more internal processor storage locations. 12. A method comprising: executing one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to compute a gather state to be used by subsequent gather operations, wherein the computation of the gather state by the gather setup operations is to include to determine one or more addresses of vector data elements to be gathered by the gather operations, and wherein the gather state comprises a non-architectural processor state stored within one or more internal processor storage locations; and executing the one or more gather operations to gather vector data elements using the gather state computed by the gather setup operations. 13. The method as in claim 12 wherein the gather setup operations comprise gather setup instructions and wherein the gather operations comprise gather instructions. 14. The method as in claim 13 further comprising: decoding the gather setup instructions and gather instructions; and executing the gather setup instructions and gather instructions, wherein the gather setup instructions calculate addresses of the vector data elements to be gathered when executed and wherein the addresses are provided for use by the gather instructions during decoding. 15. A system comprising: a memory for storing instructions and data; a cache having a plurality of cache levels for caching the instructions and data; and a non-transitory machine-readable medium including instructions, which when read by the machine-readable medium, causes a processor to: execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations, wherein the gather setup operations is further to compute a gather state comprises a non-architectural processor state stored within one or more internal processor storage locations; and execute the one or more gather operations to gather vector data elements using the one or more addresses determined by the gather setup operations. 16. The system as in claim 15 wherein the gather setup operations comprise gather setup instructions and wherein the gather operations comprise gather instructions. 17. The system as in claim 16 further comprising: a decoder to decode the gather setup instructions and gather instructions; and the processor further to execute the gather setup instructions and gather instructions, wherein the gather setup instructions calculate the addresses of data elements to be gathered when executed by the execution unit and wherein the addresses are provided to the decoder for use by the gather instructions during decoding. 18. A processor comprising: a non-transitory machine-readable medium including instructions, which when read by the machine-readable medium, causes the processor to: execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to compute a gather state to be used by subsequent gather operations, wherein the computation of the gather state by the gather setup operations is to include to determine one or more addresses of vector data elements to be gathered by the gather operations, and wherein the gather state comprises a non-architectural processor state stored within one or more internal processor storage locations; and execute the one or more gather operations to gather vector data elements using the gather state computed by the gather setup operations, wherein the gather state allows the gather setup operations and gather operations to be performed without stalling. 19. The processor as in claim 18 wherein the processor is further to: compare the gather state with a gather operation and determine whether a match exists between the gather state and the gather operation; wherein if a match exists, then the gather operation is to execute more efficiently by using the gather state computed by the gather setup operation; and wherein if a match does not exist then the gather operation is
Bit or string instructions · CPC title
of multiple operands or results {(addressing multiple banks G06F12/06)} · CPC title
LOAD or STORE instructions; Clear instruction · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Instruction analysis, e.g. decoding, instruction word fields · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.