Large lookup tables for an image processor

US11321802B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11321802-B2
Application numberUS-201916976316-A
CountryUS
Kind codeB2
Filing dateFeb 21, 2019
Priority dateFeb 27, 2018
Publication dateMay 3, 2022
Grant dateMay 3, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks, the instruction in the kernel program are replaced with a sequence of instructions that, when executed by an execution lane, causes the execution lane to read a first value from a local memory bank and a second value from the local memory bank on behalf of another execution lane belonging to a different group of execution lanes.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 2. The method of claim 1 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 3. The method of claim 1 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 4. The method of claim 3 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 5. The method of claim 4 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 6. The method of claim 1 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 7. The method of claim 6 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 8. The method of claim 1 , wherein the lookup table is larger than every one of the memory banks. 9. The method of claim 1 , wherein each execution lane can only read from one respective memory bank of the plurality of memory banks. 10. The method of claim 1 , wherein the lookup table value is a structured value having a width that is larger than a size of a register of the image processor, and wherein the sequence of instructions causes an execution lane to perform operations comprising: reading a local lookup table value using a local partitioned index; computing a position, in a transpose buffer, for the local lookup table value, wherein the position depends on a group phase of the execution lane; and storing the local lookup table value in the transpose buffer in association with the computed position. 11. The method of claim 10 , wherein the operations further comprise: receiving a remote lookup table value read from a different memory bank by a different execution lane; computing a second position, in the transpose buffer, for the remote lookup table value based on the group phase of the execution lane; and storing the remote lookup table value in the transpose buffer in association with the second position. 12. The method of claim 10 , wherein the structured value is a vector having multiple elements or a double-width data type. 13. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 14. The system of claim 13 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 15. The system of claim 13 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 16. The system of claim 15 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 17. The system of claim 16 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 18. The system of claim 13 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 19. The system of claim 18 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 20. The system of claim 13 , wherein the lookup table is larger than every one of the memory banks. 21. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a seque

Assignees

Inventors

Classifications

  • using a mask · CPC title

  • controlled by a single instruction for multiple data lanes [SIMD] · CPC title

  • Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title

  • Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title

  • Optimisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11321802B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that read…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/3891. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).