Accelerating eight-way parallel keccak execution
US-2024211268-A1 · Jun 27, 2024 · US
US11630800B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11630800-B2 |
| Application number | US-201615141703-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2016 |
| Priority date | May 1, 2015 |
| Publication date | Apr 18, 2023 |
| Grant date | Apr 18, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms. Consequently, the overall performance of many computer vision applications may be optimized.
Opening claim text (preview).
The invention claimed is: 1. A system for executing a collapsed multi-dimensional loop, the system comprising: a memory that stores a loop configuration instruction for a multi-dimensional loop and stores a plurality of loop instructions included in a one-dimensional loop; a multi-dimensional address generator that generates a plurality of addresses according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented; a load/store unit that accesses an object based on a first address from the plurality of addresses; and a vector unit that performs one or more operations on the object based on a first loop instruction included in the plurality of loop instructions. 2. The system of claim 1 , wherein the loop configuration instruction further specifies a plurality of iteration numbers and a plurality of iteration weights. 3. The system of claim 1 , wherein the multi-dimensional address generator further performs an increment or decrement operation based on the address pattern and a second loop instruction included in the plurality of loop instructions. 4. The system of claim 1 , wherein at least one of the plurality of loop instructions is associated with a flag, and the system further comprises a branch/predicate unit that generates the flag. 5. The system of claim 4 , wherein the branch/predicate unit comprises a modulo counter. 6. The system of claim 1 , wherein the loop configuration instruction comprises a very long instruction word (VLIW) instruction. 7. The system of claim 1 , wherein the load/store unit includes saturation logic, at least one of the plurality of loop instructions specifies a saturation option, and the saturation logic performs a saturation operation on the object based on the saturation option. 8. The system of claim 1 , wherein the load/store unit includes rounding logic, at least one of the plurality of loop instructions specifies a rounding option, and the rounding logic performs a rounding operation on the object based on the rounding option. 9. The system of claim 1 , wherein at least one of the plurality of loop instructions specifies at least one of a data type and a data distribution option. 10. The system of claim 1 , wherein the first address is further based on a first modifier included in the address pattern that is associated with a current iteration of the one-dimensional loop. 11. A computer-implemented method for executing a collapsed multi-dimensional loop, the method comprising: receiving a configuration instruction for a multi-dimensional loop; generating a plurality of addresses according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented; and executing the collapsed multi-dimensional loop as a single loop based on the plurality of addresses by accessing an object based on a first address from the plurality of addresses. 12. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises performing one or more operations on the object accessed based on the plurality of addresses. 13. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises performing an increment or decrement operation on the object accessed based on the plurality of addresses. 14. The method of claim 11 , wherein the configuration instruction further specifies a plurality of iteration numbers and a plurality of iteration weights. 15. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises accessing the object based on the plurality of addresses. 16. The method of claim 15 , wherein accessing the object is further based on at least one of a data type and a distribution type. 17. The method of claim 15 , wherein executing the collapsed multi-dimensional loop further comprises executing one or more operations on the object based on at least one of a saturation option and a rounding option. 18. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises: computing a flag; and conditionally controlling, based on the flag, operations on the object accessed based on the plurality of addresses. 19. The method of claim 18 , wherein computing the flag comprises performing a modulo operation on a counter variable that is associated with the flag. 20. The method of claim 11 , wherein executing the collapsed multi-dimensional loop comprises: computing a flag; if the flag matches a first condition, then executing a first operation on the object accessed based on the plurality of addresses; or if the flag does not match an activation condition, then executing a second operation on the object accessed based on the plurality of addresses. 21. A system for executing a computer vision application, the system comprising: a programmable vector processor that: executes a collapsed multi-dimensional loop included in the computer vision application as a single loop based on a plurality of addresses generated according to an address pattern by: precomputing an address modifier for each dimension included in the multi-dimensional loop based on a respective number of iterations for each dimension included in the multi-dimensional loop and a respective weight associated with each dimension included in the multi-dimensional loop; and after precomputing the address modifiers for the dimensions of the multi-dimensional loop, generating the plurality of addresses by iteratively applying the precomputed address modifiers to a base address when a corresponding loop index is incremented in order to access an object based on a first address from the plurality of addresses; a fixed-function accelerator that accelerates a fixed processing operation included in the computer vision application; and a reduced instruction set computer (RISC) core that coordinates the programmable vector processor and the fixed-function accelerator.
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
to perform conditional operations, e.g. using predicates or guards · CPC title
with multidimensional access, e.g. row/column, matrix · CPC title
Arithmetic instructions · CPC title
data or demand driven · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.