Who is the assignee on this patent?

Inspur Electronic Information Industry Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Computing method and apparatus for convolutional neural network model

US12536434B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12536434-B2
Application number	US-201917765322-A
Country	US
Kind code	B2
Filing date	Nov 27, 2019
Priority date	Oct 25, 2019
Publication date	Jan 27, 2026
Grant date	Jan 27, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing method and apparatus for a convolutional neural network model. The method comprises: acquiring a computing model of a training task of a convolutional neural network model (S 101 ); then splitting multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a plurality of multiply-add operation tasks (S 102 ); confirming a computing device corresponding to each multiply-add operation task according to the correlation between a preset computing model and the computing device (S 103 ); and finally, respectively computing each multiply-add operation task by utilizing the computing device corresponding to each multiply-add operation task (S 104 ). The purposes of improving the flexibility of migration of a CNN model training task on different computing devices or cooperative computing of different processors and improving the computing speed are achieved.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computing method of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers, and the method comprises: acquiring a computing model of a training task of a convolutional neural network model; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operation task; and calling a currently available computing device to compute the multiply-add operation task if the current load rate of a computing device corresponding to the multiply-add operation task is greater than the load rate threshold corresponding to a computing device corresponding to the multiply-add operation task. 2 . The method according to claim 1 , wherein, the acquiring a computing model of a training task of a convolutional neural network model comprises: acquiring a training task of a convolutional neural network model; and processing the training task of a convolutional neural network model by utilizing a deep learning framework to generate a data flow diagram; wherein the data flow diagram is taken as a computing model of a training task of the convolutional neural network model. 3 . The method according to claim 1 , further comprising the following step after splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks: adding an identifier to each multiply-add operation task; wherein, the identifier is configured to mark the position of each multiply-add operation task in the computing model of the training task of the convolutional neural network model. 4 . A non-transitory storage medium, having a computer program stored therein, wherein the computer program, when executed by a computer, causes the computer to perform the following steps: acquiring a computing model of a training task of a convolutional neural network model deployed on a super-heterogeneous computing platform with different types of available computing devices, wherein the different types of available computing devices include a CPU, a GPU, an FPGA, and an AI-specific processor, wherein the convolution neural network model comprises a plurality of model layers; splitting the multiply-accumulate operation of a layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks; identifying a computing device corresponding to each multiply-add operation task from the different types of available computing devices according to a corresponding relationship between a preset computing model and a computing device, wherein the corresponding relationship is preset according to specifically customized computation implementation granularity of each of the different types of available computing devices and can be modified according to subsequent computing requirements; and performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task, so as to improve the flexibility of migration of the training task of the convolutional neural network model on the different types of available computing devices by cooperative computing and improve the computing speed thereof; wherein the splitting the multiply-accumulate operation of the layer of the plurality of model layers in the computing model of the training task of the convolutional neural network model into a plurality of multiply-add operation tasks comprises: splitting the multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a first-place multiply-add operation task, an intermediate multiply-add operation task, and a last-place multiply-add operation task; wherein, the first-place multiply-add operation task comprises a multiplication computation during forward propagation computation and comprises a multiplication computation and an addition computation during backward propagation computation; the intermediate multiply-add operation task comprises a multiplication computation and an addition computation; and the last-place multiply-add operation task comprises a multiplication computation and an addition computation during forward propagation computation and comprises a multiplication computation during backward propagation computation; the performing computation on each multiply-add operation task of the layer of the plurality of model layers respectively by utilizing the computing device corresponding to each multiply-add operation task further comprises: judging that a current load rate of a computing device corresponding to the multiply-add operation task is greater than a load rate threshold corresponding to a computing device corresponding to the multiply-add operatio

Assignees

Inspur Electronic Information Industry Co Ltd

Inventors

Classifications

G06F7/5443
Sum of products (for applications thereof, see the relevant places, e.g. G06F17/10, H03H17/00) · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 69441254

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12536434B2 cover?: A computing method and apparatus for a convolutional neural network model. The method comprises: acquiring a computing model of a training task of a convolutional neural network model (S 101 ); then splitting multiply-accumulate operation in a computing model of a training task of the convolutional neural network model into a plurality of multiply-add operation tasks (S 102 ); confirming a comp…
Who is the assignee on this patent?: Inspur Electronic Information Industry Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 27 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods of operating a graphics processing unit (GPU) to train a deep neural network using a GPU local memory and related articles of manufacture

Apparatus and method with neural network

Data parallelism and halo exchange for distributed machine learning

Mixed inference using low and high precision

Scheduling neural network processing

Data parallel processing method and apparatus based on multiple graphic processing units

Frequently asked questions