Method and Apparatus for Implementing Layers on a Convolutional Neural Network Accelerator
US-2017103299-A1 · Apr 13, 2017 · US
US10360470B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10360470-B2 |
| Application number | US-201815910005-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 2, 2018 |
| Priority date | Oct 10, 2016 |
| Publication date | Jul 23, 2019 |
| Grant date | Jul 23, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Method and systems of replacing operations of depthwise separable filters with first and second replacement convolutional layers are disclosed. Depthwise separable filters contains a combination of a depthwise convolutional layer followed by a pointwise convolutional layer with input of P feature maps and output of Q feature maps. The first replacement convolutional layer contains P×P of 3×3 filter kernels formed by placing each of the P×1 of 3×3 filter kernels of the depthwise convolutional layer on respective P diagonal locations, and zero-value 3×3 filter kernels zero-value 3×3 filter kernels in all off-diagonal locations. The second replacement convolutional layer contains Q×P of 3×3 filter kernels formed by placing Q×P of 1×1 filter coefficients of the pointwise convolutional layer in center position of the respective Q×P of 3×3 filter kernels, and numerical value zero in eight perimeter positions.
Opening claim text (preview).
What is claimed is: 1. A digital integrated circuit for feature extraction comprising: a plurality of cellular neural networks (CNN) processing engines operatively coupled to at least one input/output data bus, the plurality of CNN processing engines being connected in a loop with a clock-skew circuit, each CNN processing engine comprising: a CNN processing block configured for simultaneously obtaining convolution operations results using input data and pre-trained filter coefficients of a plurality of convolutional layers, the convolutional layers containing first and second replacement convolutional layers for performing equivalent operations of depthwise separable filters that include a combination of a depthwise convolutional layer followed by a pointwise convolutional layer, wherein the depthwise convolutional layer contains P×1 of 3×3 filter kernels with an input containing P feature maps and an output containing Q feature maps, and wherein the first replacement convolutional layer contains P×P of 3×3 filter kernels formed by placing each of said P×1 of 3×3 filter kernels of the depthwise convolutional layer on respective P diagonal locations, and zero-value 3×3 filter kernels in all off-diagonal locations, where P and Q are positive integers; a first set of memory buffers operatively coupling to the CNN processing block for storing the input data; and a second set of memory buffers operative coupling to the CNN processing block for storing the pre-trained filter coefficients. 2. The digital integrated circuit of claim 1 , wherein each of the zero-value 3×3 filter kernels contains numerical value zero in all nine positions. 3. A digital integrated circuit for feature extraction comprising: a plurality of cellular neural networks (CNN) processing engines operatively coupled to at least one input/output data bus, the plurality of CNN processing engines being connected in a loop with a clock-skew circuit, each CNN processing engine comprising: a CNN processing block configured for simultaneously obtaining convolution operations results using input data and pre-trained filter coefficients of a plurality of convolutional layers, the convolutional layers containing first and second replacement convolutional layers for performing equivalent operations of depthwise separable filters that include a combination of a depthwise convolutional layer followed by a pointwise convolutional layer, wherein the depthwise convolutional layer contains P×1 of 3×3 filter kernels with an input containing P feature maps and an output containing Q feature maps, and the pointwise convolutional layer contains Q×P of 1×1 filter coefficients, and wherein the second replacement convolutional layer contains Q×P of 3×3 filter kernels formed by placing the Q×P of 1×1 filter coefficients of the pointwise convolutional layer in center position of the respective Q×P of 3×3 filter kernels, and numerical value zero in eight perimeter positions, where P and Q are positive integers; a first set of memory buffers operatively coupling to the CNN processing block for storing the input data; and a second set of memory buffers operative coupling to the CNN processing block for storing the pre-trained filter coefficients. 4. The digital integrated circuit of claim 1 , wherein the depthwise separable convolutional layer is used in MobileNet. 5. The digital integrated circuit of claim 1 , wherein the CNN processing block is further configured for performing operations of activation and pooling.
using electronic means · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Combinations of networks · CPC title
Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.