Hardware-efficient deep convolutional neural networks
US-9904874-B2 · Feb 27, 2018 · US
US10467795B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10467795-B2 |
| Application number | US-201715482724-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 8, 2017 |
| Priority date | Apr 8, 2017 |
| Publication date | Nov 5, 2019 |
| Grant date | Nov 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
Opening claim text (preview).
What is claimed is: 1. A general purpose graphic processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 2. The general purpose graphic processor of claim 1 , the processing circuitry to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 3. The general purpose graphic processor of claim 2 , wherein: the selection is implemented at run time. 4. The general purpose graphic processor of claim 2 , the processing circuitry to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 5. The general purpose graphic processor of claim 4 , the processor to: merge the results of the convolutional computations. 6. The general purpose graphics processor of claim 1 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 7. An electronic device, comprising: a general purpose graphics processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources; and a memory communicatively coupled to the general purpose graphics processor. 8. The electronic device of claim 7 , the processor to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 9. The electronic device of claim 8 , wherein: the selection is implemented at run time. 10. The electronic device of claim 8 , the processor to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 11. The electronic device of claim 10 , the processor to: merge the results of the convolutional computations. 12. The electronic device of claim 7 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 13. One or more non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: receive, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources, data representing a convolutional neural network; determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 14. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 15. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 16. The one or more non-transitory computer-readable medium of claim 15 , wherein: the selection is implemented at run time. 17. The one or more non-transitory computer-readable medium of claim 15 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 18. The one or more non-transitory computer-readable medium of claim 17 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: merge the results of the convolutional computations. 19. A computer-implemented method comprising: receiving, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second
using electronic means · CPC title
Combinations of networks · CPC title
General purpose rendering architectures · CPC title
Learning methods · CPC title
Memory management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.