Shifter implemented circulant permutation matrix operations
US-2024386072-A1 · Nov 21, 2024 · US
US9875104B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9875104-B2 |
| Application number | US-201615014265-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 3, 2016 |
| Priority date | Feb 3, 2016 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
Opening claim text (preview).
What is claimed is: 1. An apparatus for processing a multi-dimensional tensor input, the apparatus comprising: multiple tensor index elements including a tensor index element for each of multiple nested loops used to traverse an N-dimensional tensor, wherein each tensor index element is a respective first hardware register configured to store an index value for a respective nested loop that is used to traverse a respective dimension of the N-dimensional tensor, and wherein the N-dimensional tensor includes data elements arranged across each of the N-dimensions and N is an integer that is equal to or greater than one, wherein each index value is a value that is updated each time an iteration of the respective nested loop for the index value is performed; multiple dimension multiplier elements, wherein each dimension multiplier element is a respective second hardware register configured to store a multiplier value for a respective dimension of the N-dimensional tensor, wherein each multiplier value is a constant value for the respective dimension of the N-dimensional tensor for the multiplier value and remains constant for each iteration of the respective nested loop that is used to traverse the respective dimension; and one or more hardware processors configured to execute one or more instructions of an instruction set executable by the one or more hardware processors, wherein execution of the one or more instructions causes the one or more hardware processors to perform operations comprising: determining memory addresses for locations in memory for storing data values for a sequence of data elements of the N-dimensional tensor by: for each iteration of an inner loop of the nested loops: determining, for each dimension of the N-dimensional tensor, a product of (i) the index value stored in the tensor index element for the nested loop that is used to traverse the dimension and (ii) the multiplier value for the dimension; and determining a memory address for a data element that corresponds to the iteration of the inner loop based on a sum of the product for each dimension of the N-dimensional tensor; and outputting data indicating the determined memory address for each data element in the sequence of data elements of the N-dimensional tensor. 2. The apparatus of claim 1 , wherein determining the memory address for a data element that corresponds to the iteration of the inner loop based on a sum of the product for each dimension of the N-dimensional tensor comprises determining a sum of (i) the sum of the product for each dimension of the N-dimensional tensor and (ii) a base memory address. 3. The apparatus of claim 1 , wherein outputting data indicating the determined memory address for each data element comprises sequentially outputting data indicating the determined memory address for each data element in sequence as the memory addresses are determined. 4. The apparatus of claim 1 , wherein the one or more processors are further configured to: increment the index value for the inner loop by a first incremental value each time the inner loop completes; and increment an index value for a second loop in which the inner loop is nested by a second incremental value each time the second loop completes. 5. The apparatus of claim 1 , wherein the one or more processors are further configured to: receive an instruction to update the index value for the inner loop; after receiving the instruction to update the index value for the inner loop, determining that a difference between the index value for the inner loop and a tensor bound value stored in a tensor bound element for the inner loop satisfies a threshold and in response: increment the index value for the inner loop by a first incremental value. 6. The apparatus of claim 1 , wherein the one or more processors are further configured to: receive an instruction to update the index value for the inner loop; after receiving the instruction to update the index value for the inner loop, determining that a difference between the index value for the inner loop and a tensor bound value stored in a tensor bound element for the inner loop does not satisfy a threshold and in response: reset the index value for the inner loop to an initial value for the inner loop; and increment an index value for a second loop in which the inner loop is nested by a second incremental value. 7. The apparatus of claim 1 , wherein the one or more processors include one or more arithmetic logic units. 8. A system comprising: one or more hardware processors configured to perform linear operations on a N-dimensional tensor, wherein the N-dimensional tensor has data elements arranged across each of the N dimensions, and wherein N is an integer that is equal to or greater than one; multiple tensor index elements including a tensor index element for each of multiple nested loops used to traverse the N-dimensional tensor, wherein each tensor index element is a respective first hardware register configured to store an index value for a respective nested loop that is used to traverse a respective dimension of the N-dimensional tensor, and wherein each index value is a value that is updated each time an iteration of the respective nested loop for the index value is performed; multiple dimension multiplier elements, wherein each dimension multiplier element is a respective second hardware register configured to store a multiplier value for a respective dimension of the N-dimensional tensor, wherein each multiplier value is a constant value for the respective dimension of the N-dimensional tensor for the multiplier value and remains constant for each iteration of the respective nested loop that is used to traverse the respective dimension; and hardware circuitry configured to: determine memory addresses for a sequence of data elements of the N-dimensional tensor by: for each iteration of an inner loop of the nested loops: determining, for each dimension of the N-dimensional tensor, a product of (i) the index value stored in the tensor index element for the nested loop that is used to traverse the dimension and (ii) the multiplier value for the dimension; and determining a memory address for a data element that corresponds to the iteration of the inner loop based on a sum of the product for each dimension of the N-dimensional tensor; and output data indicating the determined memory address for each data element in the sequence of data elements of the N-dimensional tensor. 9. The system of claim 8 , wherein determining the memory address for a data element that corresponds to the iteration of the inner loop based on a sum of the product for each dimension of the N-dimensional tensor comprises determining a sum of (i) the sum of the product for each dimension of the N-dimensional tensor and (ii) a base memory address. 10. The system of claim 8 , wherein outputting data indicating the determined memory address for each data element comprises sequentially outputting data indicating the determined memory address for each data element in sequence as the memory addresses are determined. 11. The system of claim 8 , wherein the circuitry is further configured to: increment the index value for the inner loop by a first incremental value each time the inner loop completes; and increment an index value for a second loop in which the inner loop is nested by a second incremental value each time the second loop completes. 12. The system of claim 8 , wherein the circuitry is further configured to: receive an instruction to update the index value for the inner loop; after receiving the instruction to update the index value for the inner loop, determining t
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
Indexed addressing · CPC title
Program or instruction counter, e.g. incrementing · CPC title
Special purpose registers · CPC title
using stride · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.