Performance estimation-based resource allocation for reconfigurable architectures

US11410027B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11410027-B2
Application numberUS-201916572527-A
CountryUS
Kind codeB2
Filing dateSep 16, 2019
Priority dateSep 16, 2019
Publication dateAug 9, 2022
Grant dateAug 9, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute processing time, determining a pipeline number of the PCUs and/or the PMUs required to process the operation unit graph, and iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time taking into account whether the pipeline number of the PCUs and/or the PMUs produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available PCUs and/or PMUs.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of allocating available physical compute units and/or physical memory units of a reconfigurable data processor to operation units of an operation unit graph for execution thereof, the method including: initializing lower and upper search bounds of generic stage compute processing time required for executing an operation unit of the operation unit graph; selecting, for evaluation, an intermediate stage compute processing time between the lower and upper search bounds of the generic stage compute processing time; determining a pipeline number of the physical compute units and/or the physical memory units required to process a pipeline compute load of the operation unit graph on the reconfigurable data processor, including for each of the operation units of the operation unit graph, determining a specific stage compute processing time required to process a stage compute load of a respective one of the operation units using only one physical compute unit and/or only one physical memory unit, and determining a stage number of the physical compute units and/or the physical memory units required to process the stage compute load of the respective one of the operation units by dividing the specific stage compute processing time with the intermediate stage compute processing time; and summing the stage number of the physical compute units and/or the physical memory units for each of the operation units and producing the pipeline number of the physical compute units and/or the physical memory units; iteratively, initializing new lower and upper search bounds of the generic stage compute processing time and selecting, for evaluation in a next iteration, a new intermediate stage compute processing time between the new lower and upper search bounds of the generic stage compute processing time, taking into account whether the pipeline number of the physical compute units and/or the physical memory units produced for a prior intermediate stage compute processing time in a previous iteration is lower or higher than the available physical compute units and/or physical memory units; terminating the iterative initializing and selecting when the pipeline number of the physical compute units and/or the physical memory units produced for a current intermediate stage compute processing time in a current iteration meets a convergence criteria; allocating the available physical compute units and/or physical memory units to the operation units of the operation unit graph based on the current intermediate stage compute processing time; and executing the operation units of the operation unit graph on the reconfigurable data processor based on the allocation. 2. The computer-implemented method of claim 1 , wherein the convergence criteria is that a difference between the upper search bound and the lower search bound is below a threshold. 3. The computer-implemented method of claim 1 , wherein the lower search bound of the generic stage compute processing time is based on maximum utilization of the reconfigurable data processor and determined by dividing the pipeline compute load of the operation unit graph with total processing capacity of the reconfigurable data processor. 4. The computer-implemented method of claim 3 , wherein the pipeline compute load of the operation unit graph is determined by a total number of floating point operations (FLOP) required to execute the operation unit graph. 5. The computer-implemented method of claim 3 , wherein the total processing capacity of the reconfigurable data processor is determined by a maximum number of FLOP executable by the reconfigurable data processor per second (FLOP/s). 6. The computer-implemented method of claim 3 , wherein the upper search bound of the generic stage compute processing time is based on multiplying the lower search bound of the generic stage compute processing time with a minimum utilization factor. 7. The computer-implemented method of claim 6 , wherein the minimum utilization factor is one hundred. 8. The computer-implemented method of claim 2 , further including continuing the iterative initializing and selecting as long as the difference between the upper search bound and the lower search bound is above a threshold. 9. The computer-implemented method of claim 1 , wherein the intermediate stage compute processing time is an average of the lower and upper search bounds of the generic stage compute processing time. 10. The computer-implemented method of claim 9 , when the pipeline number of the physical compute units and/or the physical memory units produced for the prior intermediate stage compute processing time in the previous iteration is lower than the available physical compute units and/or physical memory units, further including: setting the new upper search bound for the next iteration as the prior intermediate stage compute processing time. 11. The computer-implemented method of claim 9 , when the pipeline number of the physical compute units and/or the physical memory units produced for the prior intermediate stage compute processing time in the previous iteration is higher than the available physical compute units and/or physical memory units, further including: setting the new lower search bound for the next iteration as the prior intermediate stage compute processing time. 12. The computer-implemented method of claim 1 , wherein the stage compute load of the respective one of the operation units is specified as a total number of FLOP required to execute the respective one of the operation units and is determined based on an operation type. 13. The computer-implemented method of claim 1 , wherein the stage compute load of the respective one of the operation units is specified as a total number of FLOP required to execute the respective one of the operation units and is determined based on input dimensionality, and output dimensionality. 14. The computer-implemented method of claim 1 , further including determining the stage number of the physical compute units and/or the physical memory units required to process the stage compute load by rounding up to an integer a result of dividing the stage compute processing time with the intermediate stage compute processing time. 15. The computer-implemented method of claim 1 , further including determining a throughput value based on the current intermediate stage compute processing time. 16. The computer-implemented method of claim 1 , further including determining a pipeline compute processing time required for executing the operation unit graph based on multiplying a number of the operation units of the operation unit graph with the current intermediate stage compute processing time. 17. The computer-implemented method of claim 1 , further including: selecting those operation units of the operation unit graph whose stage compute processing time is relatively greater than most other operation units of the operation unit graph; and allocating additional available physical compute units and/or the physical memory units to the selected operation units. 18. The computer-implemented method of claim 17 , wherein the allocation results in each of the operation units of the operation unit graph having substantially matching stage compute processing time. 19. The computer-implemented method of claim 1 , wherein the operation unit graph is a fused operation unit graph with at least one fused operation unit. 20. The computer-implemented method

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06F8/45Primary

    Exploiting coarse grain parallelism in compilation, i.e. parallelism between groups of instructions · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • Browsing; Visualisation therefor (for navigating the web G06F16/954; browsing optimisation for the web G06F16/957) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11410027B2 cover?
The technology disclosed relates to allocating available physical compute units (PCUs) and/or physical memory units (PMUs) of a reconfigurable data processor to operation units of an operation unit graph for execution thereof. In particular, it relates to selecting, for evaluation, an intermediate stage compute processing time between lower and upper search bounds of a generic stage compute pro…
Who is the assignee on this patent?
Sambanova Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06F8/45. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).