Data reuse and efficient processing scheme in executing convolutional neural network

US11663446B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11663446-B2
Application numberUS-202016734792-A
CountryUS
Kind codeB2
Filing dateJan 6, 2020
Priority dateJan 6, 2020
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to a device for executing a convolutional neural network operation. The device comprises a first memory, a processing array comprising a plurality of processing strings, and a controller. The controller can be configured to fetch one or more batches of data into the first memory, regroup the one or more batches of data into multiple work items, wherein a first work item partially overlaps one or more work items among the multiple work items, and broadcast the multiple work items to the processing array, wherein the first work item is transferred to two or more processing strings of the processing array.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for executing a convolutional neural network operation, comprising: a first memory; a processing array comprising a plurality of processing strings; and a controller configured to: fetch one or more batches of data into the first memory; regroup the fetched one or more batches of data into multiple work items, wherein a first work item partially overlaps one or more work items among the multiple work items; broadcast the multiple work items to the processing array, wherein the first work item is transferred to two or more processing strings of the processing array; and deallocate a portion of the one or more batches of data when the portion of the one or more batches of data is determined not to be used in a predetermined time period. 2. The device of claim 1 , wherein the plurality of processing strings are classified into a plurality of subsets and the first work item is transferred to a first processing string in each of the plurality of subsets. 3. The device of claim 2 , further comprising a second memory storing a plurality of filters of which number corresponds to a number of the subsets. 4. The device of claim 1 , wherein each of the processing strings includes a multiplier and an accumulator. 5. The device of claim 3 , wherein each of the processing strings includes a multiplier and an accumulator, and wherein the processing array includes an element-wise operation processor in each of the plurality of subsets. 6. The device of claim 1 , wherein the controller is further configured to: traverse the one or more batches of data in the first memory to determine a size of the one or more batches of data covers a predetermined data size corresponding to a size of each of the multiple work items. 7. The device of claim 6 , wherein the controller is further configured to: fetch an additional batch of data into the first memory when the size of the one or more batches of data is determined not to cover a predetermined data size corresponding to the size of each of the multiple work items. 8. The device of claim 1 , wherein each of the multiple work items has a first data size, the one or more batches of data has a plurality of channels, and each channel has a second data size covering the first data size. 9. A method for executing a convolutional neural network operation, comprising: fetching one or more batches of data in a first memory; regrouping the fetched one or more batches of data into multiple work items, wherein a first work item partially overlaps one or more work items among the multiple work items; broadcasting the multiple work items to a processing array comprising a plurality of processing strings, wherein the first work item is transferred to two or more processing strings of the processing array; and deallocating a portion of the one or more batches of data when the portion of the one or more batches of data is determined not to be used in a predetermined time period. 10. The method of claim 9 , wherein the plurality of processing strings are classified into a plurality of subsets and the first work item is transferred to a first processing string in each of the plurality of subsets. 11. The method of claim 10 , further comprising: transferring a plurality of filters to the processing array, wherein a number of the plurality of filters corresponds to a number of the plurality of subsets and each of the plurality of filter is transferred to a corresponding subset among the plurality of subsets. 12. The method of claim 9 , further comprising: performing a multiplication operation on the first work item in the two or more processing strings in parallel. 13. The method of claim 12 , further comprising: performing an addition operation on multiplication results in the two or more processing strings in parallel. 14. The method of claim 9 , further comprising: traversing the one or more batches of data in the first memory to determine a size of the one or more batches of data covers a predetermined data size corresponding to a size of each of the multiple work items. 15. The method of claim 14 , further comprising: fetching an additional batch of data into the first memory when the size of the one or more batches of data is determined not to cover a predetermined data size corresponding to the size of each of the multiple work items. 16. The method of claim 9 , further comprising: generating a plurality of outputs by the plurality of processing strings in parallel. 17. A non-transitory computer readable storage medium storing a set of instructions that are executable by at least one processor of a computing device to cause the computing device to perform a method for executing a convolutional neural network operation, the method comprising: fetching one or more batches of data in a first memory; regrouping the fetched one or more batches of data into multiple work items, wherein a first work item partially overlaps one or more work items among the multiple work items; broadcasting the multiple work items to a processing array comprising a plurality of processing strings, wherein the first work item is transferred to two or more processing strings of the processing array; and deallocating a portion of the one or more batches of data when the portion of the one or more batches of data is determined not to be used in a predetermined time period. 18. The computer readable storage medium of claim 17 , wherein the plurality of processing strings are classified into a plurality of subsets and the first work item is transferred to a first processing string in each of the plurality of subsets. 19. The computer readable storage medium of claim 18 , wherein the set of instructions that are executable by at least one processor of the computing device to cause the computing device to further perform: transferring a plurality of filters to the processing array, wherein a number of the plurality of filters corresponds to a number of the plurality of subsets and each of the plurality of filter is transferred to a corresponding subset among the plurality of subsets. 20. The computer readable storage medium of claim 17 , wherein the set of instructions that are executable by at least one processor of the computing device to cause the computing device to further perform: performing a multiplication operation on the first work item in the two or more processing strings in parallel. 21. The computer readable storage medium of claim 20 , wherein the set of instructions that are executable by at least one processor of the computing device to cause the computing device to further perform: performing an addition operation on multiplication results in the two or more processing strings in parallel. 22. The computer readable storage medium of claim 17 , wherein the set of instructions that are executable by at least one processor of the computing device to cause the computing device to further perform: traversing the one or more batches of data in the first memory to determine a size of the one or more batches of data covers a predetermined data size corresponding to a size of each of the multiple work items. 23. The computer readable storage medium of claim 22 , wherein the set of instructions that are executable by at least one processor of the computing device to cause the computing device to further perform: fetching an additional batch of data into the first memory when the size of the one

Assignees

Inventors

Classifications

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11663446B2 cover?
The present disclosure relates to a device for executing a convolutional neural network operation. The device comprises a first memory, a processing array comprising a plurality of processing strings, and a controller. The controller can be configured to fetch one or more batches of data into the first memory, regroup the one or more batches of data into multiple work items, wherein a first wor…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).