Deep neural network processing on hardware accelerators with stacked memory

US10540588B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10540588-B2
Application numberUS-201514754344-A
CountryUS
Kind codeB2
Filing dateJun 29, 2015
Priority dateJun 29, 2015
Publication dateJan 21, 2020
Grant dateJan 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method is provided for processing on an acceleration component a deep neural network. The method includes configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory stack has a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for processing a deep neural network, the method comprising: configuring an acceleration component to perform forward propagation and backpropagation stages of the deep neural network, the acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package, the acceleration component comprising multiple discrete neural engine processing units, each neural engine processing unit comprising both input buffer memory and logic circuitry that implements at least one of: dot-products, derivatives or non-linear functions, the configuring comprising: storing at least one of: weights, input activations or errors in the memory stack; assigning, to individual ones of the neural engine processing units, a portion of the weights; and streaming portions of the weights, input activations or errors from the memory stack to the input buffer memory of respective ones of the neural engine processing units, the input buffer memory of each of the respective ones of the neural engine processing units individually comprising at least one of: a weights input memory into which the portions of the weights are stored; an activations input memory into which the portions of the input activations are stored; or an error input memory into which the portions of the errors are stored; wherein at least one of the weights input memory, the activations input memory or the error input memory of one neural engine processing unit is communicationally coupled to a corresponding one of the weights input memory, the activations input memory or the error input memory of a preceding neural engine processing unit and to a corresponding one of the weights input memory, the activations input memory or the error input memory of a subsequent neural engine processing unit. 2. The method of claim 1 , wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit. 3. The method of claim 1 , wherein the memory stack comprises two or more vertically stacked memory die. 4. The method of claim 1 , wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer. 5. The method of claim 1 , wherein the memory stack is disposed above the acceleration component die. 6. The method of claim 1 , wherein the assigning, to the individual ones of the multiple discrete neural engine processing units, the portion of the weights comprises tiling a matrix multiplication independently across the multiple discrete neural engine processing units such that the multiple discrete neural engine processing units perform the matrix multiplication in parallel, the matrix multiplication including a matrix comprising the weights. 7. The method of claim 1 , wherein the configuring further comprises exchanging the input activations in a shift-register-like fashion across the individual ones of the multiple discrete neural engine processing units, the individual ones of the multiple discrete neural engine processing units operating in a synchronous manner. 8. A system for processing a deep neural network, the system comprising: an acceleration component that performs forward propagation and backpropagation stages of the deep neural network, the acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package; and multiple discrete neural engine processing units on the acceleration component die, the multiple discrete neural engine processing units each comprising both input buffer memory and logic circuitry that implements at least one of: dot-products, derivatives or non-linear functions; wherein the memory stack has stored thereon at least one of: weights, input activations or errors; wherein individual ones of the multiple discrete neural engine processing units are assigned a portion of the weights; and wherein portions of the weights, input activations or errors are streamed from the memory stack to the input buffer memory of respective ones of the neural engine processing units, the input buffer memory of each of the respective ones of the neural engine processing units individually comprising at least one of: a weights input memory into which the portions of the weights are stored; an activations input memory into which the portions of the input activations are stored; or an error input memory into which the portions of the errors are stored; wherein at least one of the weights input memory, the activations input memory or the error input memory of one neural engine processing unit is communicationally coupled to a corresponding one of the weights input memory, the activations input memory or the error input memory of a preceding neural engine processing unit and to a corresponding one of the weights input memory, the activations input memory or the error input memory of a subsequent neural engine processing unit. 9. The system of claim 8 , wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit. 10. The system of claim 8 , wherein the memory stack comprises two or more vertically stacked memory die. 11. The system of claim 8 , wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer. 12. The system of claim 8 , wherein the memory stack is disposed above the acceleration component die. 13. The system of claim 8 , wherein the assigning, to the individual ones of the multiple discrete neural engine processing units, the portion of the weights comprises tiling a matrix multiplication independently across the multiple discrete neural engine processing units such that the multiple discrete neural engine processing units perform the matrix multiplication in parallel, the matrix multiplication including a matrix comprising the weights. 14. The system of claim 8 , wherein the configuring further comprises exchanging the input activations in a shift-register-like fashion across the individual ones of the multiple discrete neural engine processing units, the individual ones of the multiple discrete neural engine processing units operating in a synchronous manner. 15. A system for processing a deep neural network, the system comprising: an acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package; multiple discrete neural engine processing units on the acceleration component die, the multiple discrete neural engine processing units each comprising both input buffer memory and logic circuitry that implements at least one of: dot-products, derivatives or non-linear functions; and a plurality of DRAM channels on the memory stack, wherein individual ones of the DRAM channels are coupled to individual ones of the multiple discrete neural engine processing units, wherein the memory stack has stored thereon at least one of: weights, input activations or errors; wherein individual ones of the multiple discrete neural engine processing units are assigned a portion of the weights by storing the weights in the individual ones of the DRAM channels that are coupled to the individual ones of the multiple discrete neural engine processing units that are

Assignees

Inventors

Classifications

  • Extracting rules from data · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • System on board, i.e. computer system on one or more PCB, e.g. motherboards, daughterboards or blades · CPC title

  • using electronic means · CPC title

  • Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10540588B2 cover?
A method is provided for processing on an acceleration component a deep neural network. The method includes configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory stack has a memory bandwi…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).