Neural Network Processor

US2016342891A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016342891-A1
Application numberUS-201514844524-A
CountryUS
Kind codeA1
Filing dateSep 3, 2015
Priority dateMay 21, 2015
Publication dateNov 24, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer.

First claim

Opening claim text (preview).

What is claimed is: 1 . A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, generate a plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and a vector computation unit communicatively coupled to the matrix computation unit and configured to, for each of the plurality of neural network layers: apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer. 2 . The circuit of claim 1 , further comprising: a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit, where the unified buffer is configured to receive and store output from the vector computation unit, and the unified buffer is configured to send the received output as input to the matrix computation unit. 3 . The circuit of claim 2 , further comprising: a sequencer configured to receive instructions from a host device and generate a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; and a direct memory access engine communicatively coupled to the unified buffer and the sequencer, where the direct memory access engine is configured to send the plurality of activation inputs to the unified buffer, where the unified buffer is configured to send the plurality of activation inputs to the matrix computation unit, and where the direct memory access engine is configured to read result data from the unified buffer. 4 . The circuit of claim 3 , further comprising: a memory unit configured to send the plurality of weight inputs to the matrix computation unit, and where the direct memory access engine is configured to send the plurality of weight inputs to the memory unit. 5 . The circuit of claim 1 , where the matrix computation unit is configured as a two dimensional systolic array comprising a plurality of cells. 6 . The circuit of claim 5 , where the two dimensional systolic array is a square array. 7 . The circuit of claim 5 , where the plurality of weight inputs is shifted through a first plurality of cells along a first dimension of the systolic array, and where the plurality of activation inputs is shifted through a second plurality of cells along a second dimension of the systolic array. 8 . The circuit of claim 7 , where, for a given layer in the plurality of layers, a count of the plurality of activation inputs is greater than a size of the second dimension of the systolic array, and where the systolic array is configured to: divide the plurality of activation inputs into portions, where each portion has a size less than or equal to the size of the second dimension; generating, for each portion, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer. 9 . The circuit of claim 7 , where, for a given layer in the plurality of layers, a count of the plurality of weight inputs is greater than a size of the first dimension of the systolic array, and where the systolic array is configured to: divide the plurality of weight inputs into portions, where each portion has a size less than or equal to the size of the first dimension; generating, for each portion, a respective portion of accumulated values; and combining each portion of accumulated values to generate a vector of accumulated values for the given layer. 10 . The circuit of claim 7 , where each cell in the plurality of cells comprises: a weight register configured to store a weight input; an activation register configured to store an activation input and configured to send the activation input to another activation register in a first adjacent cell along the second dimension; a sum-in register configured to store a previously summed value; multiplication circuitry communicatively coupled to the weight register and the activation register, where the multiplication circuitry is configured to output a product of the weight input and the activation input; and summation circuitry communicatively coupled to the multiplication circuitry and the sum-in register, where the summation circuitry is configured to output a sum of the product and the previously summed value, and where the summation circuitry is configured to send the sum to another sum-in register in a second adjacent cell along the first dimension. 11 . The circuit of claim 10 , where one or more cells in the plurality of cells are each configured to store the respective sum in a respective accumulator unit, where the respective sum is an accumulated value. 12 . The circuit of claim 7 , where the first dimension of the systolic array corresponds to columns of the systolic array, and where the second dimension of the systolic array corresponds to rows of the systolic array. 13 . The circuit of claim 1 , where the vector computation unit normalizes each activated value to generate a plurality of normalized values. 14 . The circuit of claim 1 , where the vector computation unit pools one or more activated values to generate a plurality of pooled values. 15 . A method for performing neural network computations for a neural network comprising a plurality of neural network layers using a circuit comprising a matrix computation unit and a vector computation unit coupled to the matrix computation unit, the method comprising, for each of the plurality of neural network layers: providing a plurality of weight inputs and a plurality of activation inputs for the neural network layer to the matrix computation unit; generating, using the matrix computation unit, a plurality of accumulated values, wherein the matrix computation unit is configured to receive the plurality of weight inputs and the plurality of activation inputs for the neural network layer and generate the plurality of accumulated values based on the plurality of weight inputs and the plurality of activation inputs; and generating, using the vector computation unit, a plurality of activated values for the neural network layer, wherein the matrix computation unit is configured to apply an activation function to each accumulated value generated by the matrix computation unit to generate a plurality of activated values for the neural network layer. 16 . The method of claim 15 , further comprising: receiving, by a unified buffer communicatively coupled to the matrix computation unit and the vector computation unit; storing output from the vector computation unit at the unified buffer; sending, from the unified buffer, the received output as input to the matrix computation unit. 17 . The method of claim 16 , further comprising: receiving, at a sequencer, instructions from a host device and generating a plurality of control signals from the instructions, where the plurality of control signals control dataflow through the circuit; sending, from a direct memory access engine communicatively coupled to the unified buffer and the sequencer, the plurality of activation inputs to the unified buffer; sending, from the unified buffer, the plurality of activation inputs to the matrix computation unit; and reading, at the direct me

Assignees

Inventors

Classifications

  • Systolic arrays · CPC title

  • Inference or reasoning models · CPC title

  • G06N3/063Primary

    using electronic means · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016342891A1 cover?
A circuit for performing neural network computations for a neural network comprising a plurality of neural network layers, the circuit comprising: a matrix computation unit configured to, for each of the plurality of neural network layers: receive a plurality of weight inputs and a plurality of activation inputs for the neural network layer, and generate a plurality of accumulated values based …
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).