Ultra-high resolution scanning fiber display
US-2015268415-A1 · Sep 24, 2015 · US
US10489680B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10489680-B2 |
| Application number | US-201715724142-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 3, 2017 |
| Priority date | Oct 4, 2016 |
| Publication date | Nov 26, 2019 |
| Grant date | Nov 26, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for efficient implementation of a convolutional layer of a convolutional neural network are disclosed. In one aspect, weight values of kernels in a kernel stack of a convolutional layer can be reordered into a tile layout with tiles of runnels. Pixel values of input activation maps of the convolutional layer can be reordered into an interleaved layout comprising a plurality of clusters of input activation map pixels. The output activation maps can be determined using the clusters of the input activation map pixels and kernels tile by tile.
Opening claim text (preview).
What is claimed is: 1. A system for executing a convolutional neural network (CNN), the system comprising: non-transitory memory configured to store: a convolutional layer of a convolutional neural network, wherein the convolutional layer comprises kernels in a kernel stack, wherein the kernels of the kernel stack are in a basic kernel layout, wherein weight values of the kernels of the kernel stack are reordered from the basic kernel layout into a tile kernel layout comprising a plurality of kernel tiles, wherein a kernel tile comprises a plurality of kernel runnels, and wherein a kernel runnel comprises a number of the weight values of the kernels of the kernel stack; and a hardware processor in communication with the non-transitory memory, the hardware processor programmed by executable instructions to: receive input activation maps of the convolutional layer, wherein the input activation maps are in a basic input activation map layout; reorder pixel values of the input activation maps from the basic input activation map layout into an interleaved input activation map layout comprising a plurality of clusters of input activation map pixels; and determine output activation maps of the convolutional layer from the plurality of kernel tiles and the plurality of clusters of input activation map pixels, wherein the output activation maps are in an interleaved output activation map layout comprising a plurality of clusters output activation map pixels. 2. The system of claim 1 , wherein the weight values of the kernels of the kernel stack are reordered from the basic kernel layout into the tile kernel layout by, iteratively: traversing along a width dimension of the kernel stack; traversing along a height dimension of the kernel stack; traversing along a width dimension of a kernel of the kernel stack; and traversing along a height dimension of the kernel of the kernel stack. 3. The system of claim 1 , wherein a first kernel runnel of the kernel tile corresponds a first kernel stack width boundary, and wherein a last kernel runnel of the kernel tile corresponds to a second kernel stack width boundary subsequent of the first kernel stack width boundary. 4. The system of claim 1 , wherein to reorder the pixel values of the input activation maps from the basic input activation map layout into the interleaved input activation map layout, the hardware processor is programmed to, iteratively: traverse along a dimension of a number of input activation maps; traverse along a width dimension of an input activation map; and traverse along a height dimension of input activation map. 5. The system of claim 1 , wherein the hardware processor is programmed to: reorder pixel values of the output activation maps from the interleaved output activation map layout into a basic output activation map layout. 6. The system of claim 5 , wherein to reorder the pixel values of the output activation maps from the interleaved output activation map into the basic output activation map layout, the hardware processor is programmed to, iteratively: traversing along a width dimension of the interleaved output activation map; and traversing along a height dimension of the interleaved output activation map. 7. The system of claim 1 , wherein to determine the output activation maps of the convolutional layer from the plurality of kernel tiles and the plurality of clusters of input activation map pixels, the hardware processor is programmed to: perform fused-multiply-add operations tile by tile on the plurality of kernel tiles and the plurality of clusters of input activation map pixels. 8. The system of claim 7 , wherein to perform the fused-multiply-add operations tile by tile on the plurality of kernel tiles and the plurality of clusters of input activation map pixels comprises, iteratively: for each output activation map pixel: set a value of the output activation map pixel to a value of zero; and for each kernel runnel of each kernel tile of the plurality of the kernel tiles, perform a fused-multiply-add operation on the each kernel runnel, an input activation map pixel corresponding to the kernel runnel and the output activation map pixel, and the output activation map pixel. 9. The system of claim 7 , wherein to perform the fused-multiply-add operations tile by tile on the plurality of kernel tiles and the plurality of clusters of input activation map pixels, the hardware processor is programmed to, iteratively: for each output activation map pixel: set a value of the output activation map pixel to a value of zero; and for each kernel runnel of each kernel tile of the plurality of the kernel tiles, perform a fused-multiply-add operation on the each kernel runnel, at least one input activation map pixel corresponding to the kernel runnel and the output activation map pixel, and the output activation map pixel. 10. The system of claim 9 , wherein the at least one input activation map pixel comprises two input activation map pixels. 11. The system of claim 1 , wherein a size of the kernel runnel in bits and a size of the input activation map runnel in bits are the same. 12. The system of any claim 11 , wherein the size of the kernel runnel in bits and a size of the output activation map runnel in bits are the same. 13. The system of claim 11 , wherein the size of the kernel runnel in bits and a size of a register of the hardware processor in bits are the same. 14. The system of claim 13 , wherein the size of the register is 128 bits. 15. The system of claim 1 , wherein the hardware processor comprises a single instruction, multiple data processor. 16. The system of claim 15 , wherein the single instruction, multiple data processor comprises a vector processor. 17. The system of claim 1 , wherein the kernels of the kernel stack in the basic kernel layout are arranged in a plurality of kernel stack channels, wherein a number of the plurality of kernel stack channels and a number of the input activation maps are the same, and wherein a number of kernels of a kernel stack channel and a number of the output activation maps are the same. 18. The system of claim 1 , wherein a kernel stack width of the kernel stack and a number of the output activation maps are the same. 19. The system of claim 1 , wherein the kernels of the kernel stack in the basic kernel layout are arranged in a plurality of kernel stack filter banks, wherein a number of the plurality of kernel stack filter banks and a number of the output activation maps are the same, and wherein a number of kernels of a kernel stack filter bank and a number of the input activation maps are the same. 20. The system of claim 1 , wherein a kernel stack height of the kernel stack and a number of the input activation maps are the same.
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
using neural networks · CPC title
Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters · CPC title
Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.