Exploiting input data sparsity in neural network compute units

US9818059B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9818059-B1
Application numberUS-201715465774-A
CountryUS
Kind codeB1
Filing dateMar 22, 2017
Priority dateOct 27, 2016
Publication dateNov 14, 2017
Grant dateNov 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method includes receiving, by a computing device, input activations and determining, by a controller of the computing device, whether each of the input activations has either a zero value or a non-zero value. The method further includes storing, in a memory bank of the computing device, at least one of the input activations. Storing the at least one input activation includes generating an index comprising one or more memory address locations that have input activation values that are non-zero values. The method still further includes providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array. The activations are provided, at least in part, from a memory address location associated with the index.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for performing computations for a neural network having a plurality of neural network layers, the method comprising: receiving, by a computing device, a plurality of input activations for processing by a neural network layer of the plurality of neural network layers; determining, by a controller of the computing device, whether each of the plurality of input activations, for processing by the neural network layer, has a zero value or a non-zero value; storing, in a memory bank of the computing device, at least one of the plurality of input activations; generating, by the controller, an index identifying only memory address locations in the memory bank that store non-zero input activation values for processing by the neural network layer; and providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array of the computing device, wherein the at least one input activation is provided, at least in part, from a memory address location of the index. 2. The method of claim 1 , wherein the index is generated based on a bitmap comprising a plurality of bits and, wherein each bit of the bitmap indicates at least one of a non-zero input activation value or a zero input activation value. 3. The method of claim 1 , further including, providing a first input activation that has a non-zero value to perform, by at least one unit of the computational array, a neural network inference computation using the non-zero input activation value, and subsequently providing a second input activation that has a zero value, and preventing, in at least one unit of the computational array, a neural network inference computation that would otherwise be performed using the zero input activation value. 4. The method of claim 3 , wherein preventing the neural network inference computation that would otherwise be performed using the zero input activation value occurs in response to the controller determining that the second input activation is provided from a memory address location that is not identified in the index. 5. The method of claim 3 , further including, detecting, by the controller, that the second input activation is provided from a memory address location that is not identified in the index, and, in response to the detecting, providing a control signal to at least one unit of the computational array to prevent a multiply operation associated with the zero input activation value. 6. The method of claim 1 , wherein the method further comprises: mapping, by the controller and to a first unit of the computational array, a first portion of a tensor computation that uses a first input activation; and mapping, by the controller and to a second unit of the computational array that differs from the first unit, a second portion of the tensor computation that also uses the first input activation. 7. The method of claim 1 , further comprising, sequentially providing a single input activation onto the data bus, the single input activation being accessed and selected from memory address locations in the memory bank that are identified using the index. 8. The method of claim 1 , wherein providing the at least one input activation onto the data bus comprises, not providing input activations that have a zero value. 9. One or more non-transitory machine-readable storage devices storing instructions for performing computations for a neural network having a plurality of neural network layers, where the instructions are executable by one or more processing devices to cause performance of operations comprising: receiving, by a computing device, a plurality of input activations for processing by a neural network layer of the plurality of neural network layers; determining, by a controller of the computing device, whether each of the plurality of input activations, for processing by the neural network layer, has a zero value or a non-zero value; storing, in a memory bank of the computing device, at least one of the plurality of input activations; generating, by the controller, an index identifying only memory address locations in the memory bank that store non-zero input activation values for processing by the neural network layer; and providing, by the controller and from the memory bank, at least one input activation onto a data bus that is accessible by one or more units of a computational array of the computing device, wherein the at least one input activation is provided, at least in part, from a memory address location of the index. 10. The machine-readable storage devices of claim 9 , wherein the index is generated based on a bitmap comprising a plurality of bits and, wherein each bit of the bitmap indicates at least one of a non-zero input activation value or a zero input activation value. 11. The machine-readable storage devices of claim 9 , where the operations further comprise: providing a first input activation that has a non-zero value to perform, by at least one unit of the computational array, a neural network inference computation using the non-zero input activation value, and subsequently providing a second input activation that has a zero value, and preventing, in at least one unit of the computational array, a neural network inference computation that would otherwise be performed using the zero input activation value. 12. The machine-readable storage devices of claim 11 , wherein preventing the neural network inference computation that would otherwise be performed using the zero input activation value occurs in response to the controller determining that the second input activation is provided from a memory address location that is not identified in the index. 13. The machine-readable storage devices of claim 11 , further including, detecting, by the controller, that the second input activation is provided from a memory address location that is not associated with the index, and, in response to detecting, providing a control signal to at least one unit of the computational array to prevent a multiply operation associated with the zero input activation value. 14. The machine-readable storage devices of claim 9 , wherein the operations further comprise: mapping, by the controller and to a first unit of the computational array, a first portion of a tensor computation that uses a first input activation; and mapping, by the controller and to a second unit of the computational array that differs from the first unit, a second portion of the tensor computation that also uses the first input activation. 15. An electronic system comprising: a controller located in a computing device, the controller including one or more processing devices; and one or more non-transitory machine-readable storage devices for storing instructions that are executable by the one or more processing devices to cause performance of operations comprising: receiving, by the computing device, a plurality of input activations for processing by a neural network layer of the plurality of neural network layers; determining, by the controller, whether each of the plurality of input activations, for processing by the neural network layer, has a zero value or a non-zero value; storing, in a memory bank of the computing device, at least one of the plurality of input activations; generating, by the controller, an index identifying only memory address locations in the memory bank that store non-zero input activation values for processing by the neural network layer; and providing, by the controller and from the memory bank,

Assignees

Inventors

Classifications

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Arithmetic instructions · CPC title

  • Operand accessing · CPC title

  • Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9818059B1 cover?
A computer-implemented method includes receiving, by a computing device, input activations and determining, by a controller of the computing device, whether each of the input activations has either a zero value or a non-zero value. The method further includes storing, in a memory bank of the computing device, at least one of the input activations. Storing the at least one input activation inclu…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).