Data-optimized neural network traversal

US10417555B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10417555-B2
Application numberUS-201615148627-A
CountryUS
Kind codeB2
Filing dateMay 6, 2016
Priority dateMay 29, 2015
Publication dateSep 17, 2019
Grant dateSep 17, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Executing a neural network includes generating an output tile of a first layer of the neural network by processing an input tile to the first layer and storing the output tile of the first layer in an internal memory of a processor. An output tile of a second layer of the neural network can be generated using the processor by processing the output tile of the first layer stored in the internal memory.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of executing a neural network by a processor that includes a neural network engine circuit having an internal memory, comprising: generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, a size of the first output tile is based at least in part on a size of the internal memory; storing the first output tile of the first layer in the internal memory; and processing, by the neural network engine circuit, and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the neural network engine circuit, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. 2. The method of claim 1 , wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps. 3. The method of claim 1 , wherein the neural network is partitioned into a plurality of frustums, wherein each frustum is processed independently. 4. The method of claim 3 , wherein the processor comprises a plurality of compute units that are configured to process the plurality of frustums in parallel. 5. The method of claim 1 , wherein the first layer and the second layer are feature extraction layers configured to process a plurality of images to generate a plurality of output feature maps, the method further comprising: processing the plurality of output feature maps for the plurality of images through a feature classification layer of the neural network in batch. 6. The method of claim 5 , wherein the processing the plurality of output feature maps of the plurality of images through the feature classification layer comprises: loading a first plurality of weights of the feature classification layer from an external memory into the internal memory of the processor; and processing each of the plurality of output feature maps using the first plurality of weights of the feature classification layer prior to loading, from the external memory, a second plurality of weights of the feature classification layer or weights of a next feature classification layer. 7. The method of claim 6 , further comprising: responsive to the processing of each of the plurality of output feature maps using the first plurality of weights of the feature classification layer, loading the second plurality of weights of the feature classification layer into the internal memory; wherein the second plurality of weights for the feature classification layer overwrite the first plurality of weights for the feature classification layer. 8. An apparatus comprising a processor configured to execute a neural network, the processor comprising: an internal memory; a first compute unit coupled to the internal memory and configured to perform executable operations including: generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory; storing the first output tile of the first layer in an internal memory of a processor; and processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and wherein a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. 9. The apparatus of claim 8 , wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps. 10. The apparatus of claim 8 , wherein the neural network is partitioned into a plurality of frustums, wherein each frustum is processed independently. 11. The apparatus of claim 10 , wherein the processor comprises a plurality of compute units that are configured to process the plurality of frustums in parallel. 12. The apparatus of claim 8 , wherein the first layer and the second layer are feature extraction layers configured to process a plurality of images to generate a plurality of output feature maps, wherein the first compute unit is configured to initiate executable operations further comprising: processing the plurality of output feature maps for the plurality of images through a feature classification layer of the neural network in batch. 13. The apparatus of claim 12 , further comprising: an external memory coupled to the first compute unit; wherein the processing the plurality of output feature maps for the plurality of images through the feature classification layer comprises loading a first plurality of weights of the feature classification layer from the external memory into the internal memory and processing each of the plurality of output feature maps using the first plurality of weights of the feature classification layer prior to loading, from the external memory, a second plurality of weights for the feature classification layer or weights of a next feature classification layer. 14. The apparatus of claim 13 , wherein the first compute unit is programmed to initiate executable operations further comprising: responsive to the processing of each of the plurality of output feature maps using the first plurality of weights of the feature classification layer, loading the second plurality of weights of the feature classification layer into the internal memory; wherein the second plurality of weights for the feature classification layer overwrite the first plurality of weights for the feature classification layer. 15. A computer program product comprising a computer readable storage medium having program code stored thereon for executing a neural network, the program code executable by a processor to perform operations comprising: generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory; storing the first output tile of the first layer in an internal memory of a processor; and processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into at least one frustum and a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. 16. The computer program product of claim 15 , wherein each tile comprises of a portion of each feature map of a plurality feature maps such that the tile have a three-dimensional profile that includes a height, a width and a number of feature maps. 17. The computer program product of c

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G06N3/10Primary

    Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • Memory management · CPC title

  • Physics · mapped topic

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10417555B2 cover?
Executing a neural network includes generating an output tile of a first layer of the neural network by processing an input tile to the first layer and storing the output tile of the first layer in an internal memory of a processor. An output tile of a second layer of the neural network can be generated using the processor by processing the output tile of the first layer stored in the internal …
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 17 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).