What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multiply-accumulate “0” data gating

US10853035B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10853035-B2
Application number	US-202016833128-A
Country	US
Kind code	B2
Filing date	Mar 27, 2020
Priority date	Apr 28, 2017
Publication date	Dec 1, 2020
Grant date	Dec 1, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising: an instruction cache to receive graphics processing instructions; a general-purpose graphics processing compute block comprising a plurality of graphics processing cores to perform operations to execute the graphics processing instructions; and processing circuitry to: receive, into at least one of a multiply unit or an accumulate unit, a first inference weight and a second inference weight from a layer of a convolutional neural network; and negate the second inference weight, bypass the multiply unit, and use a shift register when at least one of the first inference weight or the second inference weight are within a threshold value of a power of two, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 2. The apparatus of claim 1 , further comprising processing circuitry to: bypass the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of a one, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 3. The apparatus of claim 2 , further comprising processing circuitry to: negate the second operand and bypass the multiplier when at least one of the first operand or the second operand is a negative one; and treat at least one of the first operand or the second operand as a zero when a value of the first operand or the second operand is within a threshold of zero. 4. The apparatus of claim 2 , further comprising a thread scheduler comprising logic, at least partially including hardware logic, to: break an input vector into a plurality of segments; and perform a dot product using the plurality of segments. 5. The apparatus of claim 1 , further comprising processing circuitry to: produce no output from the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of zero. 6. The apparatus of claim 5 , further comprising processing circuitry to: bypass the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of a one, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 7. The apparatus of claim 1 , wherein the multiply unit comprises processing circuitry to: receive the first inference weight and a first indicator of a first number of valid bits in the first inference weight; receive the second inference weight, and a second indicator of a second number of valid bits in the second inference weight; and multiply only the valid bits in the first inference weight and valid bits in the second inference weight. 8. The apparatus of claim 1 , further comprising processing circuitry to: produce no output when at least one of the first inference weight or the second inference weight is within a threshold value of zero. 9. The apparatus of claim 1 , further comprising processing circuitry to: bypass the multiply unit and use the shift register when at least one of the first operand or the second operand is a power of two; and treat at least one of the first operand or the second operand as a power of two when a value of the first operand or the second operand is within a threshold of a power of two. 10. The apparatus of claim 1 , wherein the plurality of execution units are on a single integrated circuit. 11. A method, comprising: receiving graphics processing instructions in an instruction cache of a general purpose graphics processor: performing operations to execute the graphics processing instructions; and receiving, into at least one of a multiply unit or an accumulate unit of the general purpose graphics processor, a first inference weight and a second inference weight from a layer of a convolutional neural network; and negating the second inference weight, bypassing the multiply unit, and using a shift register when at least one of the first inference weight or the second inference weight are within a threshold value of a power of two, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 12. The method of claim 11 , further comprising: bypassing the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of a one, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 13. The method of claim 12 , further comprising: negating the second operand and bypass the multiplier when at least one of the first operand or the second operand is a negative one. 14. The method of claim 12 , further comprising: breaking an input vector into a plurality of segments; and performing a dot product using the plurality of segments. 15. The method of claim 11 , further comprising processing: producing no output from the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of zero. 16. The method of claim 15 , further comprising: bypassing the multiply unit when at least one of the first inference weight or the second inference weight is within a threshold value of a one, such that the multiply unit performs no operations on the first inference weight or the second inference weight. 17. The method of claim 11 , further comprising: receiving the first inference weight and a first indicator of a first number of valid bits in the first inference weight; receiving the second inference weight, and a second indicator of a second number of valid bits in the second inference weight; and multiplying only the valid bits in the first inference weight and valid bits in the second inference weight. 18. The method of claim 11 , further comprising: producing no output when at least one of the first inference weight or the second inference weight is within a threshold value of zero. 19. The method of claim 11 , further comprising: bypassing the multiply unit and use a shift register when at least one of the first operand or the second operand is a power of two; and treating at least one of the first operand or the second operand as a zero when a value of the first operand or the second operand is within a threshold of zero. 20. The method of claim 11 , further comprising processing circuitry to: treat at least one of the first operand or the second operand as a power of two when a value of the first operand or the second operand is within a threshold of a power of two.

Assignees

Intel Corp

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06T1/20Primary
Processor architectures; Processor configuration, e.g. pipelining · CPC title
G06F9/5066
Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs (mappping at compile time, see G06F8/451) · CPC title
G06F7/5332Primary
by skipping over strings of zeroes or ones, e.g. using the Booth Algorithm · CPC title
G06N20/00
Machine learning · CPC title

Patent family

Related publications grouped by family.

View patent family 61827533

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10853035B2 cover?: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 01 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Low-power architecture for sparse neural network

Reducing power consumption in a fused multiply-add (FMA) unit responsive to input data values

Performing multi-convolution operations in a parallel processing system

Frequently asked questions