What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Configuration of application software on multi-core image processor

US11030005B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11030005-B2
Application number	US-201916657656-A
Country	US
Kind code	B2
Filing date	Oct 18, 2019
Priority date	May 12, 2017
Publication date	Jun 8, 2021
Grant date	Jun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is described. The method includes calculating data transfer metrics for kernel-to-kernel connections of a program having a plurality of kernels that is to execute on an image processor. The image processor includes a plurality of processing cores and a network connecting the plurality of processing cores. Each of the kernel-to-kernel connections include a producing kernel that is to execute on one of the processing cores and a consuming kernel that is to execute on another one of the processing cores. The consuming kernel is to operate on data generated by the producing kernel. The method also includes assigning kernels of the plurality of kernels to respective ones of the processing cores based on the calculated data transfer metrics.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by one or more computers, the method comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a respective total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 2. The method of claim 1 , wherein the device comprises a plurality of line buffer units, and further comprising assigning one or more line buffer units to be a respective source of one or more kernels. 3. The method of claim 2 , further comprising assigning one or more line buffer units to be a respective sink of one or more kernels. 4. The method of claim 3 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 5. The method of claim 4 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 6. The method of claim 4 , wherein the transfer distances are based on distances along a network ring of the network. 7. The computing system of claim 1 , wherein each stencil processor comprises an execution lane array and a two-dimensional shift-register array. 8. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 9. The one or more non-transitory computer storage media of claim 8 , wherein the device comprises a plurality of line buffer units, and wherein the operations further comprise assigning one or more line buffer units to be a respective source of one or more kernels. 10. The one or more non-transitory computer storage media of claim 9 , wherein the operations further comprise assigning one or more line buffer units to be a respective sink of one or more kernels. 11. The one or more non-transitory computer storage media of claim 10 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 12. The one or more non-transitory computer storage media of claim 11 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 13. The one or more non-transitory computer storage media of claim 11 , wherein the transfer distances are based on distances along a network ring of the network. 14. The one or more non-transitory computer storage media of claim 8 , wherein each stencil processor comprises an execution lane array and a two-dimensional shift-register array. 15. A system, comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 16. The system of claim 15 , wherein the device comprises a plurality of line buffer units, and wherein the operations further comprise assigning one or more line buffer units to be a respective source of one or more kernels. 17. The system of claim 16 , wherein the operations further comprise assigning one or more line buffer units to be a respective sink of one or more kernels. 18. The system of claim 17 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 19. The system of claim 18 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 20. The system of claim 18 , wherein the transfer distances are based on distances along a network ring of the network.

Assignees

Google Llc

Inventors

Classifications

Y02D10/00
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title
G06T1/20Primary
Processor architectures; Processor configuration, e.g. pipelining · CPC title
G06F9/5005Primary
to service a request · CPC title
G06F9/505
considering the load · CPC title

Patent family

Related publications grouped by family.

View patent family 61094605

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11030005B2 cover?: A method is described. The method includes calculating data transfer metrics for kernel-to-kernel connections of a program having a plurality of kernels that is to execute on an image processor. The image processor includes a plurality of processing cores and a network connecting the plurality of processing cores. Each of the kernel-to-kernel connections include a producing kernel that is to ex…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).