Methods of operating a graphics processing unit (GPU) to train a deep neural network using a GPU local memory and related articles of manufacture
US-11599798-B2 · Mar 7, 2023 · US
US12536434B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12536434-B2 |
| Application number | US-201917765322-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 27, 2019 |
| Priority date | Oct 25, 2019 |
| Publication date | Jan 27, 2026 |
| Grant date | Jan 27, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computing method and apparatus for a convolutional neural network model. The method comprises: acquiring a computing model of a training task of a convolutional neural network model (S 101 ); then splitting multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a plurality of multiply-add operation tasks (S 102 ); confirming a computing device corresponding to each multiply-add operation task according to the correlation between a preset computing model and the computing device (S 103 ); and finally, respectively computing each multiply-add operation task by utilizing the computing device corresponding to each multiply-add operation task (S 104 ). The purposes of improving the flexibility of migration of a CNN model training task on different computing devices or cooperative computing of different processors and improving the computing speed are achieved.
Opening claim text (preview).
The invention claimed is: 1 . A computing method of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers, and the method comprises: acquiring a computing model of a training task of a convolutional neural network model; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operation task; and calling a currently available computing device to compute the multiply-add operation task if the current load rate of a computing device corresponding to the multiply-add operation task is greater than the load rate threshold corresponding to a computing device corresponding to the multiply-add operation task. 2 . The method according to claim 1 , wherein, the acquiring a computing model of a training task of a convolutional neural network model comprises: acquiring a training task of a convolutional neural network model; and processing the training task of a convolutional neural network model by utilizing a deep learning framework to generate a data flow diagram; wherein the data flow diagram is taken as a computing model of a training task of the convolutional neural network model. 3 . The method according to claim 1 , further comprising the following step after splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks: adding an identifier to each multiply-add operation task; wherein, the identifier is configured to mark the position of each multiply-add operation task in the computing model of the training task of the convolutional neural network model. 4 . A non-transitory storage medium, having a computer program stored therein, wherein the computer program, when executed by a computer, causes the computer to perform the following steps: acquiring a computing model of a training task of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and can be modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operatio
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.