Sub-graph in frequency domain and dynamic selection of convolution implementation on a GPU

US10467795B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10467795-B2
Application numberUS-201715482724-A
CountryUS
Kind codeB2
Filing dateApr 8, 2017
Priority dateApr 8, 2017
Publication dateNov 5, 2019
Grant dateNov 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A general purpose graphic processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 2. The general purpose graphic processor of claim 1 , the processing circuitry to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 3. The general purpose graphic processor of claim 2 , wherein: the selection is implemented at run time. 4. The general purpose graphic processor of claim 2 , the processing circuitry to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 5. The general purpose graphic processor of claim 4 , the processor to: merge the results of the convolutional computations. 6. The general purpose graphics processor of claim 1 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 7. An electronic device, comprising: a general purpose graphics processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources; and a memory communicatively coupled to the general purpose graphics processor. 8. The electronic device of claim 7 , the processor to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 9. The electronic device of claim 8 , wherein: the selection is implemented at run time. 10. The electronic device of claim 8 , the processor to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 11. The electronic device of claim 10 , the processor to: merge the results of the convolutional computations. 12. The electronic device of claim 7 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 13. One or more non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: receive, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources, data representing a convolutional neural network; determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 14. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 15. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 16. The one or more non-transitory computer-readable medium of claim 15 , wherein: the selection is implemented at run time. 17. The one or more non-transitory computer-readable medium of claim 15 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 18. The one or more non-transitory computer-readable medium of claim 17 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: merge the results of the convolutional computations. 19. A computer-implemented method comprising: receiving, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second

Assignees

Inventors

Classifications

  • G06N3/063Primary

    using electronic means · CPC title

  • Combinations of networks · CPC title

  • G06T15/005Primary

    General purpose rendering architectures · CPC title

  • Learning methods · CPC title

  • Memory management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10467795B2 cover?
In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).