What technology area does this patent fall under?

Primary CPC classification G06N3/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Data compaction and memory bandwidth reduction for sparse neural networks

US10096134B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10096134-B2
Application number	US-201715422359-A
Country	US
Kind code	B2
Filing date	Feb 1, 2017
Priority date	Feb 1, 2017
Publication date	Oct 9, 2018
Grant date	Oct 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer program product, and system for sparse convolutional neural networks that improves efficiency is described. Multi-bit data for input to a processing element is received at a compaction engine. The multi-bit data is determined to equal zero and a single bit signal is transmitted from the memory interface to the processing element in lieu of the multi-bit data, where the single bit signal indicates that the multi-bit data equals zero. A compacted data sequence for input to a processing element is received by a memory interface. The compacted data sequence is transmitted from the memory interface to an expansion engine. Non-zero values are extracted from the compacted data sequence and zeros are inserted between the non-zero values by the expansion engine to generate an expanded data sequence that is output to the processing element.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, at a compaction engine, multi-bit data for input to a processing element; determining, by the compaction engine, that the multi-bit data equals zero; and transmitting a single bit signal from the compaction engine to the processing element in lieu of the multi-bit data, wherein the single bit signal indicates that the multi-bit data equals zero. 2. The method of claim 1 , wherein determining that the multi-bit data equals zero comprises determining that the multi-bit data is less than a pre-determined threshold value. 3. The method of claim 1 , further comprising generating a zero product value at an output of a multiplier while preventing signal switching within the multiplier. 4. The method of claim 3 , further comprising disabling a clock signal configured to store the inputs to the multiplier in a plurality of registers. 5. The method of claim 3 , further comprising, after storing the zero product, disabling a clock signal configured to store the output of the multiplier. 6. The method of claim 1 , further comprising disabling an input register configured to store the multi-bit data, wherein the input register includes outputs that are coupled to a multiplier input. 7. The method of claim 1 , wherein the multi-bit data represents a weight. 8. The method of claim 1 , wherein the multi-bit data represents an input activation. 9. A system, comprising: a deep learning accelerator configured to: receive, at a compaction engine, multi-bit data for input to a processing element; determine that the multi-bit data equals zero; and transmit a single bit signal from the compaction engine to the processing element in lieu of the multi-bit data, wherein the single bit signal indicates that the multi-bit data equals zero. 10. A method, comprising: receiving, at a memory interface, a compacted data sequence for input to a processing element; transmitting the compacted data sequence from the memory interface to an expansion engine; extracting, by the expansion engine, non-zero values from the compacted data sequence; inserting zeros between the non-zero values to generate an expanded data sequence; and outputting the expanded data sequence from the expansion engine to the processing element. 11. The method of claim 10 , wherein a number of zeros to insert between each of the non-zero values is encoded in the compacted data sequence. 12. The method of claim 10 , wherein a bitmask indicates a position in the expanded data sequence for each non-zero value of the non-zero values. 13. The method of claim 10 , wherein the outputting comprises broadcasting the expanded data sequence to the processing element and additional processing elements. 14. The method of claim 10 , further comprising: receiving, by a compaction engine, a data sequence including zero values and non-zero values generated by the processing element; encoding positions of each non-zero value into the compacted data sequence; determining counts of zeros between the non-zero values; and inserting the counts and non-zero values into the compacted data sequence. 15. The method of claim 14 , further comprising, identifying values generated by the processing element that are less than a pre-determined threshold value as zero values. 16. The method of claim 10 , wherein the expanded data sequence comprises a set of weights. 17. The method of claim 10 , wherein the expanded data sequence comprises a set of activations. 18. The method of claim 10 , further comprising transmitting a single bit signal to the processing element that indicates whether a value in the expanded data sequence is a zero value or non-zero value. 19. The method of claim 18 , further comprising, based on the single bit signal, generating a zero product value at an output of a multiplier while preventing signal switching within the multiplier for each zero value in the expanded data sequence. 20. A system, comprising: a memory storing a compacted data sequence; and a deep learning accelerator configured to: receive the compacted data sequence from the memory for input to a processing element; and transmit the compacted data sequence from the memory interface to an expansion engine, wherein the expansion engine: extracts non-zero values from the compacted data sequence and inserts zeros between the non-zero values to generate an expanded data sequence that is output by the expansion engine to the processing element. 21. A system, comprising: a memory storing a compacted data sequence; and a deep learning accelerator configured to: receive the compacted data sequence from the memory for input to a processing element; and transmit the compacted data sequence from the memory interface to an expansion engine, wherein the expansion engine: determines that the multi-bit data encoded in the compacted data sequence equals zero; and transmits a single bit signal to the processing element in lieu of the multi-bit data, wherein the single bit signal indicates that the multi-bit data equals zero.

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G06N3/063Primary
using electronic means · CPC title
G06T9/002Primary
using neural networks · CPC title
G06T1/60
Memory management · CPC title
G06T1/20
Processor architectures; Processor configuration, e.g. pipelining · CPC title

Patent family

Related publications grouped by family.

View patent family 62980002

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10096134B2 cover?: A method, computer program product, and system for sparse convolutional neural networks that improves efficiency is described. Multi-bit data for input to a processing element is received at a compaction engine. The multi-bit data is determined to equal zero and a single bit signal is transmitted from the memory interface to the processing element in lieu of the multi-bit data, where the single…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).