What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Sub-graph in frequency domain and dynamic selection of convolution implementation on a GPU

US10467795B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10467795-B2
Application number	US-201715482724-A
Country	US
Kind code	B2
Filing date	Apr 8, 2017
Priority date	Apr 8, 2017
Publication date	Nov 5, 2019
Grant date	Nov 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A general purpose graphic processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 2. The general purpose graphic processor of claim 1 , the processing circuitry to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 3. The general purpose graphic processor of claim 2 , wherein: the selection is implemented at run time. 4. The general purpose graphic processor of claim 2 , the processing circuitry to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 5. The general purpose graphic processor of claim 4 , the processor to: merge the results of the convolutional computations. 6. The general purpose graphics processor of claim 1 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 7. An electronic device, comprising: a general purpose graphics processor comprising: a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources; and a processing circuitry to: determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources; and a memory communicatively coupled to the general purpose graphics processor. 8. The electronic device of claim 7 , the processor to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 9. The electronic device of claim 8 , wherein: the selection is implemented at run time. 10. The electronic device of claim 8 , the processor to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 11. The electronic device of claim 10 , the processor to: merge the results of the convolutional computations. 12. The electronic device of claim 7 , the processing circuitry to update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 13. One or more non-transitory computer-readable medium comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: receive, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second type of execution unit having a second set of execution resources, different from the first set of execution resources, data representing a convolutional neural network; determine a complete sub-graph of a convolutional neural network that can be executed in a frequency domain; generate a predicted level of activation sparsity for one or more layers of the convolutional neural network; and apply convolutional computations in the sub-graph in the frequency domain; wherein the convolutional computations are performed at variable levels of integer precisions based at least in part on the predicted level of activation sparsity of a layer in the convolutional neural network using the first set of execution resources, while internal computations are performed in a baseline precision level of 8-bits or 16-bits using the second set of execution resources. 14. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: update the predicted level of activation sparsity for the one or more layers of the convolutional neural network on-line when the convolution neural network is operated in inference mode. 15. The one or more non-transitory computer-readable medium of claim 13 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: dynamically select a convolutional implementation based at least in part on running a short comparison for each convolution in the network; and expose one or more embedded cast operations in a load/store instruction to support loading data for the convolutional computations in a variable integer precision. 16. The one or more non-transitory computer-readable medium of claim 15 , wherein: the selection is implemented at run time. 17. The one or more non-transitory computer-readable medium of claim 15 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: divide a neural network into a plurality of tiles; and apply convolutional computations to the plurality of tiles. 18. The one or more non-transitory computer-readable medium of claim 17 , comprising one or more instructions that when executed on at least one processor configure the at least one processor to perform one or more operations to: merge the results of the convolutional computations. 19. A computer-implemented method comprising: receiving, in a general purpose graphics processor comprising a plurality of execution units comprising at least a first type of execution unit having a first set of execution resources and a second

Assignees

Intel Corp

Inventors

Classifications

G06N3/063Primary
using electronic means · CPC title
G06N3/045
Combinations of networks · CPC title
G06T15/005Primary
General purpose rendering architectures · CPC title
G06N3/08
Learning methods · CPC title
G06T1/60
Memory management · CPC title

Patent family

Related publications grouped by family.

View patent family 61800244

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10467795B2 cover?: In an example, an apparatus comprises a plurality of execution units; and logic, at least partially including hardware logic, to determine a sub-graph of a network that can be executed in a frequency domain and apply computations in the sub-graph in the frequency domain. Other embodiments are also disclosed and claimed.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Hardware-efficient deep convolutional neural networks

Generating high resolution images from low resolution images for semiconductor applications

Data-optimized neural network traversal

Performing multi-convolution operations in a parallel processing system

Weight-shifting mechanism for convolutional neural networks

Vehicle vision system with display

Frequently asked questions