What technology area does this patent fall under?

Primary CPC classification G06F9/3891. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Large lookup tables for an image processor

US11321802B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11321802-B2
Application number	US-201916976316-A
Country	US
Kind code	B2
Filing date	Feb 21, 2019
Priority date	Feb 27, 2018
Publication date	May 3, 2022
Grant date	May 3, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks, the instruction in the kernel program are replaced with a sequence of instructions that, when executed by an execution lane, causes the execution lane to read a first value from a local memory bank and a second value from the local memory bank on behalf of another execution lane belonging to a different group of execution lanes.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 2. The method of claim 1 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 3. The method of claim 1 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 4. The method of claim 3 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 5. The method of claim 4 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 6. The method of claim 1 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 7. The method of claim 6 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 8. The method of claim 1 , wherein the lookup table is larger than every one of the memory banks. 9. The method of claim 1 , wherein each execution lane can only read from one respective memory bank of the plurality of memory banks. 10. The method of claim 1 , wherein the lookup table value is a structured value having a width that is larger than a size of a register of the image processor, and wherein the sequence of instructions causes an execution lane to perform operations comprising: reading a local lookup table value using a local partitioned index; computing a position, in a transpose buffer, for the local lookup table value, wherein the position depends on a group phase of the execution lane; and storing the local lookup table value in the transpose buffer in association with the computed position. 11. The method of claim 10 , wherein the operations further comprise: receiving a remote lookup table value read from a different memory bank by a different execution lane; computing a second position, in the transpose buffer, for the remote lookup table value based on the group phase of the execution lane; and storing the remote lookup table value in the transpose buffer in association with the second position. 12. The method of claim 10 , wherein the structured value is a vector having multiple elements or a double-width data type. 13. A system comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a sequence of instructions that, when executed by an execution lane, causes the execution lane to perform operations comprising: reading a local lookup table value from a local memory bank using a local partitioned index, shifting the local partitioned index through the shift-register array and receiving through the shift-register array a remote partitioned index from a different execution lane, reading a remote lookup table value from the local memory bank using the remote partitioned index, and shifting the remote lookup table value back to the different execution lane through the shift-register array. 14. The system of claim 13 , wherein the sequence of instructions causes an execution lane to further perform operations comprising: computing the local partitioned index from an original lookup table index of the kernel program using integer division by a number of partitions of the lookup table. 15. The system of claim 13 , wherein the operations further comprise: receiving a remote lookup table value read by a different execution lane from a remote memory bank. 16. The system of claim 15 , wherein the operations further comprise: selecting between the local lookup table value or the remote lookup table value. 17. The system of claim 16 , wherein the operations further comprise: selecting the local lookup table value if an original lookup table index modulo N is equal to a partition number of the execution lane, wherein N is a number of partitions of the lookup table. 18. The system of claim 13 , wherein the sequence of instructions causes each execution lane to read multiple lookup table values for each single lookup table access in the input kernel program. 19. The system of claim 18 , wherein the lookup table is partitioned such that all even indexes are stored in a first memory bank and all odd indexes are stored in a second memory bank. 20. The system of claim 13 , wherein the lookup table is larger than every one of the memory banks. 21. A computer program product, encoded on one or more non-transitory computer storage media, comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks, wherein multiple execution lanes in each of a plurality of groups of execution lanes are configured to share a same respective memory bank of the plurality of memory banks of the image processor; determining that the kernel program has an instruction that reads a lookup table value for a lookup table partitioned across the plurality of memory banks; and in response, replacing the instruction in the kernel program with a seque

Assignees

Google Llc

Inventors

Classifications

G06F9/30038
using a mask · CPC title
G06F9/3887
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
G06F9/30036
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
G06F8/45
Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title
G06F8/443
Optimisation · CPC title

Patent family

Related publications grouped by family.

View patent family 65724525

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11321802B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for supporting large lookup tables on an image processor. One of the methods includes receiving an input kernel program for an image processor having a two-dimensional array of execution lanes, a shift-register array, and a plurality of memory banks. If the kernel program has an instruction that read…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06F9/3891. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 03 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).