Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US9760372B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9760372-B2 |
| Application number | US-201113224090-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 1, 2011 |
| Priority date | Sep 1, 2011 |
| Publication date | Sep 12, 2017 |
| Grant date | Sep 12, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for combining data values through associative operations. The method includes, with a processor, arranging any number of data values into a plurality of columns according to natural parallelism of the associative operations and reading each column to a register of an individual processor. The processors are directed to combine the data values in the columns in parallel using a first associative operation. The results of the first associative operation for each column are stored in a register of each processor.
Opening claim text (preview).
What is claimed is: 1. A method for combining data values through associative operations, the method comprising: arranging data values into a plurality of columns according to natural parallelism of the associative operations to form a first data matrix; reading each column to a register of a respective one of a number of processors; directing each of the processors to execute a respective first one of the associative operations, in parallel, on the data values in their respective columns to which the processors are assigned; storing the respective results of the first associative operation for each respective column to a respective register of each processor; arranging the plurality of columns into a plurality of two dimensional matrices; and arranging the two dimensional matrices into a three dimensional matrix. 2. The method of claim 1 , in which arranging data values into a plurality of columns comprises transposing rows in a second data matrix into columns in the data matrix. 3. The method of claim 1 , further comprising storing the results of the respective first associative operation to the memory of the computing device. 4. The method of claim 1 , further comprising: arranging the results of the first associative operation into a number of primary results columns; reading the data values in each of the individual primary results columns into registers of one of the plurality of processors; and directing each of the processors to combine the data values in the primary results column to produce a primary final result in parallel. 5. The method of claim 4 , further comprising arranging the primary results columns into a two dimensional results matrix. 6. A method for combining data values through associative operations, the method comprising: arranging data values into a plurality of columns according to natural parallelism of the associative operations to form a first data matrix; reading each column to a register of a respective one of a number of processors; directing each of the processors to execute a respective first one of the associative operations, in parallel, on the data values in their respective columns to which the processors are assigned; storing the respective results of the first associative operation for each respective column to a respective register of each processor; in which the number of columns is greater than the number of processors, the method further comprising: associating the results of the first associative operation with corresponding columns; identifying unprocessed columns which are not associated with a result; assigning one processor to each unprocessed column; and directing the processors to combine the data values in the unprocessed columns through the first associative operation. 7. A method for combining data values through associative operations, the method comprising: arranging data values into a plurality of columns according to natural parallelism of the associative operations to form a first data matrix; reading each column to a register of a respective one of a number of processors; directing each of the processors to execute a respective first one of the associative operations, in parallel, on the data values in their respective columns to which the processors are assigned; storing the respective results of the first associative operation for each respective column to a respective register of each processor; arranging the results of the first associative operation into a number of primary results columns; reading the data values in each of the individual primary results columns into registers of one of the plurality of processors; directing each of the processors to combine the data values in the primary results column to produce a primary final result in parallel; arranging the primary results columns into a two dimensional results matrix; wherein the number of columns in the two dimensional results matrix is greater than the number of processors, the method further comprising: assigning a processor to each unprocessed column; and directing the assigned processor to combine data values in the unprocessed column to produce a secondary result. 8. The method of claim 7 , further comprising combining secondary results from the columns of the two dimensional results matrix by performing the respective first associative operation. 9. The method of claim 7 , further comprising combining secondary results from the columns of the two dimensional results matrix by performing a prefix scan. 10. The method of claim 7 , further comprising combining secondary results from the columns of the two dimensional results matrix by performing a parallel reduction. 11. A method for combining data values through associative operations, the method comprising: arranging data into a data matrix, having a number of columns, according to natural parallelism of the associative operations, the data matrix being stored in a memory device; assigning each of a plurality of processors to a subset of the data stored in a contiguous memory location; directing the plurality of processors to execute a number of respective associative operations on data values in the subsets to produce intermediate results; writing the intermediate respective results to registers in the processors; and producing a final result from the intermediate results, the final result being stored in a memory of a computing device; in which the number of columns is greater than a number of the processors, the method further comprising: associating the intermediate results with corresponding columns; identifying unprocessed columns which are not associated with an intermediate result; assigning one processor to each unprocessed column; and directing the processors to combine data values in the unprocessed columns through the number of associative operations. 12. The method of claim 11 , in which the subset of data stored in a contiguous memory location is a column within the data matrix. 13. The method of claim 11 , in which the associative operations are performed on data values in the column without writing to memory, reading from memory, or synchronization to produce the intermediate results. 14. The method of claim 11 , in which the subset of data stored in a contiguous memory location is written to a register of the assigned processor. 15. The method of claim 11 , in which the data matrix is a two dimensional matrix. 16. The method of claim 11 , in which the data matrix is a three dimensional matrix. 17. The method of claim 16 , in which the intermediate results comprise a two dimensional matrix. 18. The method of claim 17 , in which the final result is a one dimensional matrix. 19. The method of claim 16 , in which a number of columns in one of the data matrix or the intermediate results is greater than the number of processors, the method further comprising: assigning a processor to each unprocessed column; and directing the assigned processor to combine data values in the unprocessed column according to the associative operations.
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
Arithmetic instructions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.