Configurable convolution engine

US9858636B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9858636-B1
Application numberUS-201615198478-A
CountryUS
Kind codeB1
Filing dateJun 30, 2016
Priority dateJun 30, 2016
Publication dateJan 2, 2018
Grant dateJan 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a configurable convolution engine that receives configuration information to perform convolution or its variant operations on streaming input data of various formats. To process streaming input data, input data of multiple channels are received and stored in an input buffer circuit in an interleaved manner. Data values of the interleaved input data are retrieved and forwarded to multiplier circuits where multiplication with a corresponding filter element of a kernel is performed. Varying number of kernels with different sizes and sparsity can also be used for the convolution operations.

First claim

Opening claim text (preview).

What is claimed is: 1. A convolution engine, comprising: an input buffer circuit configured to receive and store data values of a plurality of channels of input data in an interleaved manner, the plurality of channels of input data including at least a first channel of input data and a second channel of input data interleaved with the first channel of input data; a datapath switch circuit configured to retrieve, from the input buffer circuit, data values of the first channel of input data and skip data values of the second channel of input data in a cycle, and retrieve the data values of the second channel of input data and skip the data values of the first channel of input data in another cycle; a filter switch circuit configured to retrieve filter elements of at least one kernel for performing a convolution operation; and a computation core circuit configured to: receive the data values from the datapath switch and the retrieved filter elements from the filter switch; multiply each of the data values with a corresponding filter element to obtain multiplied values; and process subsets of multiplied values to obtain output values. 2. The convolution engine of claim 1 , further comprising an output buffer configured to store each of the output values in a predefined memory location of the output buffer. 3. The convolution engine of claim 2 , wherein the output values of a plurality of output channels are interleaved in the output buffer. 4. The convolution engine of claim 1 , further comprising a filter storage configured to store filter elements for performing the convolution operation, the filter switch circuit retrieving the filter elements from the filter storage. 5. The convolution engine of claim 1 , wherein operations of the datapath switch circuit and the filter switch circuit are defined by configuration information received by the convolution engine. 6. The convolution engine of claim 5 , wherein the configuration information comprises: (i) step values for defining distances between center data values in columns or rows of data values stored in the input buffer circuit; and (ii) sparse values indicating sparsity of the filter elements in the at least one kernel. 7. The convolution engine of claim 6 , wherein data values corresponding to filter elements in the at least one kernel to be disregarded are not retrieved by the datapath switch circuit. 8. The convolution engine of claim 1 , wherein the input buffer circuit stores (i) in a predetermined row and a column of memory location, first data values for a first subset of bits of a data unit in the input data and (ii) in the row and another column adjacent to the column storing the first subset of bits, a second data value for a second subset of bits of the data unit. 9. The convolution engine of claim 8 , wherein the first subset of the bits is most significant bits of the data, and the second subset of the bits is least significant bits of the data. 10. The convolution engine of claim 1 , further comprising a post-processing circuit configured to perform a post-convolution operation on the output values to generate an output of the convolution engine. 11. The convolution engine of claim 10 , wherein the post-convolution operation comprises normalized cross correlation. 12. The convolution engine of claim 1 , wherein the process performed on the subsets of multiplied values by the computation core includes one of (i) accumulating of the subsets of multiplied values to obtain an output value or (ii) selecting one of the multiplied values as an output value according to a criteria. 13. The convolution engine of claim 1 , wherein one or more of the input buffer circuit, the datapath switch circuit, the filter switch circuit and the computation core circuit are configured to operate in a patch mode. 14. A method of performing convolution, comprising: storing interleaved data values of a plurality of channels of input data in an input buffer circuit, the plurality of channels of input data including at least a first channel of input data and a second channel of input data interleaved with the first channel of input data; retrieving, by a datapath switch circuit and from the input buffer circuit, data values of the first channel of input data and skipping data values of the second channel of input data in a cycle, and retrieving the data values of the second channel of input data and skipping the data values of the first channel of input data in another cycle; retrieving, by a filter switch circuit, filter elements of at least one kernel for performing a convolution operation; receiving, by a computation core circuit, the data values from the datapath switch and the retrieved filter elements from the filter switch; multiplying, by the computation core circuit, each of the data values with a corresponding filter element to obtain multiplied values; and processing, by the computation core circuit, subsets of multiplied values to obtain output values. 15. The method of claim 14 , further comprising storing each of the output values in a predefined memory location of an output buffer. 16. The method of claim 15 , wherein the output values of a plurality of output channels are interleaved in the output buffer. 17. The method of claim 14 , further comprising storing, in a filter storage, the filter elements for performing the convolution operation, and wherein the filter switch circuit retrieves the filter elements from the filter storage. 18. The method of claim 14 , further comprising receiving configuration information by the convolution engine, the configuration information defining operations of the datapath switch circuit and the filter switch circuit. 19. The method of claim 18 , wherein the configuration information comprises: (i) step values for defining distances between center data values in columns or rows of data values stored in the input buffer circuit; and (ii) sparse values indicating sparsity of the filter elements in the at least one kernel. 20. The method of claim 19 , wherein data values corresponding to filter elements in the at least one kernel to be disregarded are not retrieved by the datapath switch circuit. 21. The method of claim 14 , wherein storing the interleaved data values in the input buffer circuit comprises (i) storing, in a predetermined row and a column of memory location, first data values for a first subset of bits of a data unit in the input data and (ii) storing, in the row and another column adjacent to the column storing the first subset of bits, a second data value for a second subset of bits of the data unit. 22. The method of claim 21 , wherein the first subset of the bits is most significant bits of the data, and the second subset of the bits is least significant bits of the data. 23. The method of claim 14 , further comprising performing a post-convolution operation on the output values, by a post-processing circuit, to generate an output of the convolution engine. 24. The method of claim 23 , wherein the post-convolution operation comprises normalized cross correlation. 25. The method of claim 14 , wherein processing the subsets of multiplied values includes one of (i) accumulating of the subsets of multiplied values to obtain an output value or (ii) selecting one of the multiplied values as an output value according to a criteria. 26. An image signal processor, comprising: an input buf

Assignees

Inventors

Classifications

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • using local operators · CPC title

  • G06F17/153Primary

    Multidimensional correlation or convolution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9858636B1 cover?
Embodiments of the present disclosure relate to a configurable convolution engine that receives configuration information to perform convolution or its variant operations on streaming input data of various formats. To process streaming input data, input data of multiple channels are received and stored in an input buffer circuit in an interleaved manner. Data values of the interleaved input dat…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).