Method and apparatus for execution of neural network

US11768911B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11768911-B2
Application numberUS-202017003354-A
CountryUS
Kind codeB2
Filing dateAug 26, 2020
Priority dateSep 24, 2019
Publication dateSep 26, 2023
Grant dateSep 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to methods and apparatuses for execution of a neural network. An exemplary method can be implemented by a processing unit. The processing unit can include a command parser configured to dispatch commands and computing tasks and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task. Each core can include a convolution unit, a pooling unit, at least one operation unit and a sequencer communicatively coupled with the convolution unit, the pooling unit, and the at least one operation unit and configured to distribute instructions of the dispatched computing task to the convolution unit, the pooling unit, and the at least one operation unit for execution. The method can include: reading, by the convolution unit, data from a local memory of the at least one operation unit; performing, by the convolution unit, a convolution operation on the data to generate a feature map; and performing, by the pooling unit, a pooling operation on the feature map.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by a processing unit, the processing unit comprising a command parser having circuitry configured to dispatch commands and computing tasks and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task, each core comprising a convolution unit, a pooling unit, at least one operation unit and a sequencer communicatively coupled with the convolution unit, the pooling unit, and the at least one operation unit and having circuitry configured to distribute instructions of the dispatched computing task to the convolution unit, the pooling unit, and the at least one operation unit for execution, the sequencer further comprising circuitry configured to modify the instructions of the core; the method comprising: reading, by the convolution unit, data from a local memory of the at least one operation unit; performing, by the convolution unit, a convolution operation on the data to generate a feature map; and performing, by the pooling unit, a pooling operation on the feature map. 2. The method according to claim 1 , wherein the at least one operation unit comprises: a matrix multiplication data path (DP) and an element-wise operation (EWOP) unit, and the method further comprises: performing, by the matrix multiplication DP, a matrix multiplication operation on convolution data from the convolution unit to generate intermediate data; and performing, by the EWOP unit, an EWOP to generate the feature map based on the intermediate data. 3. The method according to claim 1 , wherein the pooling unit comprises an interpolation unit and a pooling data path, and the method further comprises: interpolating, by the interpolation unit, the feature map; and performing, by the pooling data path, a pooling operation on the interpolated feature map. 4. The method according to claim 3 , further comprising: determining, by the pooling unit, a region of interest on the feature map. 5. The method according to claim 1 , further comprising: monitoring, by the sequencer, execution of a neural network task; and parallelizing, by the sequencer, sub-tasks of the neural network task. 6. The method according to claim 1 , wherein each core further comprises a direct memory access (DMA) unit, and the method further comprises: transferring, by the DMA unit, data within each core and among the at least one core; and inputting or outputting, by the DMA unit, data in parallel with computation of the convolution unit, the pooling unit, or the at least one operation unit. 7. The method according to claim 6 , further comprising: transforming, by the DMA unit, data between forms of an image and a matrix. 8. The method according to claim 1 , further comprising: performing, by the pooling unit, the pooling operation at least partly in parallel the convolution operation of the convolution unit. 9. The method according to claim 1 , wherein each core further comprises a scalar unit and a scalar register file, and the method further comprises: performing, by the scalar unit, a scalar operation; and writing, by the scalar unit, a result of the scalar operation in the scalar register file. 10. A non-transitory computer-readable storage medium storing a set of instructions that is executable by at least one processing unit to cause the computer to perform a method, the processing unit comprising a command parser having circuitry configured to dispatch commands and computing tasks and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task, each core comprising a convolution unit, a pooling unit, at least one operation unit and a sequencer communicatively coupled with the convolution unit, the pooling unit, and the at least one operation unit and having circuitry configured to distribute instructions of the dispatched computing task to the convolution unit, the pooling unit, and the at least one operation unit for execution, the sequencer further comprising circuitry configured to modify the instructions of the core; the method comprising: reading, by the convolution unit, data from a local memory of the at least one operation unit; performing, by the convolution unit, a convolution operation on the data to generate a feature map; and performing, by the pooling unit, a pooling operation on the feature map. 11. The non-transitory computer-readable storage medium according to claim 10 , wherein the at least one operation unit comprises: a matrix multiplication data path (DP) and an element-wise operation (EWOP) unit, and the set of instructions is executable by at least one processing unit to cause the computer to perform: performing, by the matrix multiplication DP, a matrix multiplication operation on convolution data from the convolution unit to generate intermediate data; and performing, by the EWOP unit, an EWOP to generate the feature map based on the intermediate data. 12. The non-transitory computer-readable storage medium according to claim 10 , wherein the set of instructions is executable by at least one processing unit to cause the computer to perform: monitoring, by the sequencer, execution of a neural network task; and parallelizing, by the sequencer, sub-tasks of the neural network task. 13. The non-transitory computer-readable storage medium according to claim 10 , wherein each core further comprises a direct memory access (DMA) unit, and the set of instructions is executable by at least one processing unit to cause the computer to perform: transferring, by the DMA unit, data within each core and among the at least one core; and inputting or outputting, by the DMA unit, data in parallel with computation of the convolution unit, the pooling unit, or the at least one operation unit. 14. The non-transitory computer-readable storage medium according to claim 10 , wherein the set of instructions is executable by at least one processing unit to cause the computer to perform: performing, by the pooling unit, the pooling operation at least partly in parallel the convolution operation of the convolution unit. 15. A processing unit, comprising: a command parser having circuitry configured to dispatch commands and computing tasks; and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task, each core comprising: a convolution unit having circuitry configured, by a convolution instruction, to perform a convolution operation to generate a feature map; a pooling unit having circuitry configured, by a pooling instruction, to perform a pooling operation on the feature map; at least one operation unit having circuitry configured to process data; and a sequencer communicatively coupled with the convolution unit, the pooling unit, and the at least one operation unit, and having circuitry configured to distribute instructions of the dispatched computing task to the convolution unit, the pooling unit, and the at least one operation unit for execution; wherein the sequencer further comprises circuitry configured to modify the instructions of the core. 16. The processing unit according to claim 15 , wherein the at least one operation unit comprises: a local memory for storing data; a matrix multiplication data path (DP) having circuitry configured, by a matrix multiplication instruction, to perform a matrix multiplication operation; and an element-wise operation (EWOP) unit having circuitry configured, by a vector instruction, to perform an EWOP. 17. The

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06F17/153Primary

    Multidimensional correlation or convolution · CPC title

  • Program control block organisation · CPC title

  • with variable priority · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11768911B2 cover?
The present disclosure relates to methods and apparatuses for execution of a neural network. An exemplary method can be implemented by a processing unit. The processing unit can include a command parser configured to dispatch commands and computing tasks and at least one core communicatively coupled with the command parser and configured to process the dispatched computing task. Each core can i…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G06F17/153. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).