Configuration of application software on multi-core image processor

US11030005B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11030005-B2
Application numberUS-201916657656-A
CountryUS
Kind codeB2
Filing dateOct 18, 2019
Priority dateMay 12, 2017
Publication dateJun 8, 2021
Grant dateJun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is described. The method includes calculating data transfer metrics for kernel-to-kernel connections of a program having a plurality of kernels that is to execute on an image processor. The image processor includes a plurality of processing cores and a network connecting the plurality of processing cores. Each of the kernel-to-kernel connections include a producing kernel that is to execute on one of the processing cores and a consuming kernel that is to execute on another one of the processing cores. The consuming kernel is to operate on data generated by the producing kernel. The method also includes assigning kernels of the plurality of kernels to respective ones of the processing cores based on the calculated data transfer metrics.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method performed by one or more computers, the method comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a respective total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 2. The method of claim 1 , wherein the device comprises a plurality of line buffer units, and further comprising assigning one or more line buffer units to be a respective source of one or more kernels. 3. The method of claim 2 , further comprising assigning one or more line buffer units to be a respective sink of one or more kernels. 4. The method of claim 3 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 5. The method of claim 4 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 6. The method of claim 4 , wherein the transfer distances are based on distances along a network ring of the network. 7. The computing system of claim 1 , wherein each stencil processor comprises an execution lane array and a two-dimensional shift-register array. 8. One or more non-transitory computer storage media encoded with computer program instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 9. The one or more non-transitory computer storage media of claim 8 , wherein the device comprises a plurality of line buffer units, and wherein the operations further comprise assigning one or more line buffer units to be a respective source of one or more kernels. 10. The one or more non-transitory computer storage media of claim 9 , wherein the operations further comprise assigning one or more line buffer units to be a respective sink of one or more kernels. 11. The one or more non-transitory computer storage media of claim 10 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 12. The one or more non-transitory computer storage media of claim 11 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 13. The one or more non-transitory computer storage media of claim 11 , wherein the transfer distances are based on distances along a network ring of the network. 14. The one or more non-transitory computer storage media of claim 8 , wherein each stencil processor comprises an execution lane array and a two-dimensional shift-register array. 15. A system, comprising: one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising: receiving a request to compute kernel assignments for an image processing pipeline to be executed on a device having a plurality of stencil processors, wherein the image processing pipeline comprises a plurality of kernels; generating a plurality of candidate kernel assignments, each candidate kernel assignment assigning each kernel of the image processing pipeline to a respective stencil processor of the plurality of stencil processors; computing a total weight for each of the plurality of candidate kernel assignments, the total weight for each candidate kernel assignment being based on respective transfer sizes of data transferred between kernels according to the candidate kernel assignment; selecting a candidate kernel assignment according to the respective total weights computed for each of the plurality of candidate kernel assignments; and assigning kernels of the plurality of kernels to respective stencil processors according to the selected candidate kernel assignment. 16. The system of claim 15 , wherein the device comprises a plurality of line buffer units, and wherein the operations further comprise assigning one or more line buffer units to be a respective source of one or more kernels. 17. The system of claim 16 , wherein the operations further comprise assigning one or more line buffer units to be a respective sink of one or more kernels. 18. The system of claim 17 , wherein assigning the one or more line buffer units to be a respective sink of one or more kernels comprises: generating, for a particular producing kernel assigned to a particular stencil processor, a list of line buffer units sorted by transfer distances from the particular stencil processor; and assigning, to the particular producing kernel, a closest line buffer unit having enough memory to buffer data generated by the particular producing kernel. 19. The system of claim 18 , wherein the transfer distances are based on a respective number of nodal hops within the network between kernels. 20. The system of claim 18 , wherein the transfer distances are based on distances along a network ring of the network.

Assignees

Inventors

Classifications

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • G06F9/5005Primary

    to service a request · CPC title

  • considering the load · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11030005B2 cover?
A method is described. The method includes calculating data transfer metrics for kernel-to-kernel connections of a program having a plurality of kernels that is to execute on an image processor. The image processor includes a plurality of processing cores and a network connecting the plurality of processing cores. Each of the kernel-to-kernel connections include a producing kernel that is to ex…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).