Device and method for processing convolution operation using kernel

US11675997B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11675997-B2
Application numberUS-201816163772-A
CountryUS
Kind codeB2
Filing dateOct 18, 2018
Priority dateNov 14, 2017
Publication dateJun 13, 2023
Grant dateJun 13, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided are a method and apparatus for processing a convolution operation in a neural network. The apparatus may include a memory, and a processor configured to read, from the memory, one of divided blocks of input data stored in a memory; generate an output block by performing the convolution operation on the one of the divided blocks with a kernel; generate a feature map by using the output block, and write the feature map to the memory.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for processing a convolution operation in a neural network, the apparatus comprising: a memory; and a processor configured to: read, from the memory, a divided block among a plurality of divided blocks of input data stored in the memory, wherein a plurality of different addresses in the memory are assigned to the plurality of divided blocks, respectively, perform the convolution operation on the divided block by applying a kernel to the divided block to generate respective output values corresponding to inner pixels inside the divided block and outer pixels that are included in adjacent blocks of the divided block, without reading values of the outer pixels from the memory to provide a conflict-free access to the memory, generate an output block by using the respective output values, generate a feature map by using the output block, and write the feature map to the memory. 2. The apparatus of claim 1 , wherein a size of the output block is larger than a size of the one of the divided blocks. 3. The apparatus of claim 1 , wherein a size of the output block varies according to a size of the kernel. 4. The apparatus of claim 1 , wherein addresses respectively corresponding to the divided blocks are assigned with respect to the divided blocks, and wherein the divided blocks are respectively stored in a plurality of banks of the memory and are accessible by the addresses. 5. The apparatus of claim 4 , wherein the processor is further configured to perform the conflict-free access to one of the plurality of banks with reference to an address of the one of the divided blocks and read data of the one of the divided blocks from the one of the plurality of banks, based on the conflict-free access. 6. The apparatus of claim 1 , wherein the processor is further configured to execute a code temporarily storing kernel information to prevent a stack overflow when performing the convolution operation. 7. The apparatus of claim 1 , further comprising a buffer, wherein the processor is further configured to accumulate the output block and other outputs previously stored in the buffer by writing the output block to the buffer and generate the feature map based on results accumulated in the buffer. 8. The apparatus of claim 7 , wherein the processor is further configured to convert data of a vertical form of the output block into data of a horizontal form and write the converted data of the horizontal form to the buffer. 9. The apparatus of claim 7 , wherein the processor is further configured to perform accumulation using address information of data stored in the buffer and tag information indicating block type information. 10. A method of processing a convolution operation in a neural network, the method comprising: reading, from a memory, a divided block among a plurality of divided blocks of input data stored in the memory, wherein a plurality of different addresses in the memory are assigned to the plurality of divided blocks, respectively; performing the convolution operation on the divided block by applying a kernel to the divided block to generate respective output values corresponding to inner pixels inside the divided block and outer pixels that are included in adjacent blocks of the divided block, without reading values of the outer pixels from the memory to provide a conflict-free access to the memory; generating an output block by using the respective output values; generating, via a processor, a feature map by using the output block; and writing the feature map to the memory. 11. The method of claim 10 , wherein a size of the output block is larger than a size of the one of the divided blocks. 12. The method of claim 10 , wherein a size of the output block varies according to a size of the kernel. 13. The method of claim 10 , wherein addresses respectively corresponding to the divided blocks are assigned with respect to the divided blocks, and wherein the divided blocks are respectively stored in a plurality of banks of the memory and are accessible by the addresses. 14. The method of claim 13 , wherein the reading comprises: performing the conflict-free access to one of the plurality of banks with reference to an address of the one of the divided blocks; and reading data of the one of the divided blocks from the one of the plurality of banks, based on the conflict-free access. 15. The method of claim 10 , further comprising: executing a code temporarily storing kernel information to prevent a stack overflow when performing the convolution operation. 16. The method of claim 10 , wherein the generating of the feature map comprises: accumulating the output block and other outputs previously stored in a buffer by writing the output block to the buffer; generating the feature map based on results accumulated in the buffer. 17. The method of claim 16 , wherein the accumulating comprises converting data of a vertical form of the output block into data of a horizontal form and writing the converted data of the horizontal form to the buffer. 18. A non-transitory computer-readable recording medium having recorded thereon a program for performing, via a processor, operations comprising: reading, from a memory, a divided block among a plurality of divided blocks of input data stored in the memory, wherein a plurality of different addresses in the memory are assigned to the plurality of divided blocks, respectively; performing the convolution operation on the divided block by applying a kernel to the divided block to generate respective output values corresponding to inner pixels inside the divided block and outer pixels that are included in adjacent blocks of the divided block, without reading values of the outer pixels from the memory to provide a conflict-free access to the memory; generating an output block by using the respective output values; generating, via a processor, a feature map by using the output block; and writing the feature map to the memory.

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Correlation function computation {including computation of convolution operations (arithmetic circuits for sum of products per se, e.g. multiply-accumulators G06F7/5443; digital filters, e.g. FIR, IIR, adaptive filters H03H17/00)} · CPC title

  • with look ahead addressing means · CPC title

  • Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication (G06F12/08 takes precedence) · CPC title

  • Forward inferencing; Production systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11675997B2 cover?
Provided are a method and apparatus for processing a convolution operation in a neural network. The apparatus may include a memory, and a processor configured to read, from the memory, one of divided blocks of input data stored in a memory; generate an output block by performing the convolution operation on the one of the divided blocks with a kernel; generate a feature map by using the output …
Who is the assignee on this patent?
Samsung Electronics Co Ltd, Samsung Eleotronicc Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 13 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).