Heterogeneous architecture for depthwise-seperable convolution based neural network computation acceleration

US12430544B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12430544-B2
Application numberUS-202117460584-A
CountryUS
Kind codeB2
Filing dateAug 30, 2021
Priority dateAug 30, 2021
Publication dateSep 30, 2025
Grant dateSep 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention relates to a method and a system for performing depthwise separable convolution on an input data in a convolutional neural network. The invention utilizes a heterogeneous architecture with a number of MAC arrays including 1D MAC arrays and 2D MAC arrays with a Winograd conversion logic to perform depthwise separable convolution. The depthwise separable convolution uses less weight parameters and thus less multiplications while it obtains the same computation results as the traditional convolution.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for achieving high utilization of a neural network based computation using depthwise-separable convolution, wherein the method comprising: performing a point-wise convolution with a two dimensional MAC array on an input data for generating a first output within a spatial domain, wherein the first output is distributed and stored in a pipeline buffer; performing a depth-wise convolution with a one dimensional MAC array on the first output for generating a second output within a Winograd domain; performing a point-wise convolution with the two dimensional MAC array on the second output for generating a final output within the spatial domain from the Winograd domain; splitting the final output into a plurality of units by a processing unit; stripping one or more units of the plurality of units by the processing unit, wherein stripping the one or more units allow processing depthwise-separable convolution by a single DDR load and a single DDR store limits access to the DDR; and accumulating the one or more units of the plurality of units for computing the depthwise-separable convolution. 2. The method in accordance with claim 1 , wherein processing of the first output to the second output from the point-wise convolution to the depth-wise convolution is performed by using a number of buffers. 3. The method in accordance with claim 2 , wherein the number of buffers form a pseudo pipeline. 4. The method in accordance with claim 1 , wherein the conversion from the spatial domain to the Winograd domain is performed by an adder tree structure. 5. The method in accordance with claim 4 , wherein the conversion from the Winograd domain to the spatial domain is performed by an adder tree structure. 6. The method in accordance to claim 4 , wherein the adder tree structure supports different kernel sizes. 7. The method in accordance with claim 1 , wherein the neural network architecture is a heterogeneous architecture. 8. The method in accordance with claim 1 , wherein the depthwise-separable convolution reduces computation complexity and power demand. 9. A heterogeneous architecture for depthwise-separable convolution based neural network computation acceleration, wherein the heterogeneous architecture comprising: a plurality of MAC arrays to perform depthwise-separable convolution, wherein the depthwise-separable convolution, further wherein the plurality of MAC arrays comprising: one or more two-dimensional MAC-arrays for performing a point-wise convolution in a spatial domain, wherein the one or more two-dimensional MAC-arrays performs the point-wise convolution on an input data to generates a first output, wherein the first output is distributed and stored in a pipeline buffer; and one or more one-dimensional MAC-arrays for performing a depthwise convolution in a Winograd domain, wherein the one or more one-dimensional MAC-arrays performs the Winograd convolution on the first output to generate a second output, further wherein the one or more two-dimensional MAC-arrays performs the point-wise convolution on the second output with an adder tree structure to generate a final output; a processing unit, wherein the processing unit comprising: a splitting unit, wherein the splitting unit splits the final output into a plurality of tiles; a stripping unit, wherein the stripping unit strips one or more units of the plurality of tiles, wherein stripping the one or more units allow processing depthwise-separable convolution by a single DDR load and a single DDR store limits access to the DDR; and an accumulator, wherein the accumulator accumulates the one or more units of the plurality of tiles for computing the depthwise-separable convolution. 10. A computer program product comprising a non-transitory computer useable medium having computer program logic for enabling at least one processor in a computer system for performing a high utilization of a neural network based computation using depthwise-separable convolution, said computer program logic comprising: performing a point-wise convolution with a two dimensional MAC array on an input data for generating a first output within a spatial domain, wherein the first output is distributed and stored in a pipeline buffer; performing a depth-wise convolution with a one dimensional MAC array on the first output for generating a second output within a Winograd domain; performing a point-wise convolution with the two dimensional MAC array on the second output for generating a final output within the spatial domain from the Winograd domain; splitting the final output into a plurality of units by a processing unit; stripping one or more units of the plurality of units by the processing unit, wherein stripping the one or more units allow processing depthwise-separable convolution by a single DDR load and a single DDR store limits access to the DDR; and accumulating the one or more units of the plurality of units for computing the depthwise-separable convolution.

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12430544B2 cover?
The present invention relates to a method and a system for performing depthwise separable convolution on an input data in a convolutional neural network. The invention utilizes a heterogeneous architecture with a number of MAC arrays including 1D MAC arrays and 2D MAC arrays with a Winograd conversion logic to perform depthwise separable convolution. The depthwise separable convolution uses les…
Who is the assignee on this patent?
Black Sesame International Holding Ltd, Black Sesame Technologies Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).