Computing method and apparatus for convolutional neural network model

US12536434B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12536434-B2
Application numberUS-201917765322-A
CountryUS
Kind codeB2
Filing dateNov 27, 2019
Priority dateOct 25, 2019
Publication dateJan 27, 2026
Grant dateJan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing method and apparatus for a convolutional neural network model. The method comprises: acquiring a computing model of a training task of a convolutional neural network model (S 101 ); then splitting multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a plurality of multiply-add operation tasks (S 102 ); confirming a computing device corresponding to each multiply-add operation task according to the correlation between a preset computing model and the computing device (S 103 ); and finally, respectively computing each multiply-add operation task by utilizing the computing device corresponding to each multiply-add operation task (S 104 ). The purposes of improving the flexibility of migration of a CNN model training task on different computing devices or cooperative computing of different processors and improving the computing speed are achieved.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computing method of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers, and the method comprises: acquiring a computing model of a training task of a convolutional neural network model; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operation task; and calling a currently available computing device to compute the multiply-add operation task if the current load rate of a computing device corresponding to the multiply-add operation task is greater than the load rate threshold corresponding to a computing device corresponding to the multiply-add operation task. 2 . The method according to claim 1 , wherein, the acquiring a computing model of a training task of a convolutional neural network model comprises: acquiring a training task of a convolutional neural network model; and processing the training task of a convolutional neural network model by utilizing a deep learning framework to generate a data flow diagram; wherein the data flow diagram is taken as a computing model of a training task of the convolutional neural network model. 3 . The method according to claim 1 , further comprising the following step after splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks: adding an identifier to each multiply-add operation task; wherein, the identifier is configured to mark the position of each multiply-add operation task in the computing model of the training task of the convolutional neural network model. 4 . A non-transitory storage medium, having a computer program stored therein, wherein the computer program, when executed by a computer, causes the computer to perform the following steps: acquiring a computing model of a training task of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and can be modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operatio

Assignees

Inventors

Classifications

  • Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536434B2 cover?
A computing method and apparatus for a convolutional neural network model. The method comprises: acquiring a computing model of a training task of a convolutional neural network model (S 101 ); then splitting multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a plurality of multiply-add operation tasks (S 102 ); confirming a comp…
Who is the assignee on this patent?
Inspur Electronic Information Industry Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).