Image processor complex transfer functions
US-10853908-B2 · Dec 1, 2020 · US
US11321802B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11321802-B2 |
| Application number | US-201916976316-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 21, 2019 |
| Priority date | Feb 27, 2018 |
| Publication date | May 3, 2022 |
| Grant date | May 3, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks, the instruction in the kernel program are replaced with a sequence of instructions that, when executed by an execution lane, causes the execution lane to read a first value from a local memory bank and a second value from the local memory bank on behalf of another execution lane belonging to a different group of execution lanes.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 2. The method of claim 1 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 3. The method of claim 1 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 4. The method of claim 3 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 5. The method of claim 4 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 6. The method of claim 1 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 7. The method of claim 6 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 8. The method of claim 1 , wherein the lookup table is larger than every one of the memory banks. 9. The method of claim 1 , wherein each execution lane can only read from one respective memory bank of the plurality of memory banks. 10. The method of claim 1 , wherein the lookup table value is a structured value having a width that is larger than a size of a register of the image processor, and wherein the sequence of instructions causes an execution lane to perform operations comprising: reading a local lookup table value using a local partitioned index; computing a position, in a transpose buffer, for the local lookup table value, wherein the position depends on a group phase of the execution lane; and storing the local lookup table value in the transpose buffer in association with the computed position. 11. The method of claim 10 , wherein the operations further comprise: receiving a remote lookup table value read from a different memory bank by a different execution lane; computing a second position, in the transpose buffer, for the remote lookup table value based on the group phase of the execution lane; and storing the remote lookup table value in the transpose buffer in association with the second position. 12. The method of claim 10 , wherein the structured value is a vector having multiple elements or a double-width data type. 13. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 14. The system of claim 13 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 15. The system of claim 13 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 16. The system of claim 15 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 17. The system of claim 16 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 18. The system of claim 13 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 19. The system of claim 18 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 20. The system of claim 13 , wherein the lookup table is larger than every one of the memory banks. 21. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a seque
using a mask · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title
Optimisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.