Dynamically partitioning workload in a deep neural network module to reduce power consumption

US11476869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11476869-B2
Application numberUS-201815953330-A
CountryUS
Kind codeB2
Filing dateApr 13, 2018
Priority dateApr 17, 2017
Publication dateOct 18, 2022
Grant dateOct 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network processor, comprising: a plurality of neurons; and a group partitioner and scheduler configured to: divide a workload for the neural network processor into a plurality of partitions based on a quantity of the plurality of neurons, and assign a group of the neurons to each of the plurality of partitions to maximize a total number of the plurality of neurons that simultaneously process the workload while reducing power consumption; and wherein the neurons within each group of neurons are configured to: process the workload in an assigned partition to generate a partial output value by performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 2. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 3. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 4. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 5. The neural network processor of claim 1 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 6. The neural network processor of claim 1 , wherein the plurality of neurons are powered down following generation of the output values for the workload. 7. A neural network processor, comprising: a buffer storing an input volume and a weight volume; a plurality of neurons; and a group partitioner and scheduler configured to partition the input volume and the weight volume into a plurality of partitions based on a quantity of the plurality of neurons, and assign a group of the neurons to each of the plurality of partitions to maximize a total number of the plurality of neurons that simultaneously process a workload while reducing power consumption; and wherein the neurons within each group of neurons are configured to: process the workload in an assigned partition to generate a partial output value by performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 8. The neural network processor of claim 7 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 9. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 10. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 11. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 12. The neural network processor of claim 7 , wherein the plurality of neurons are powered down following generation of the output values for the workload. 13. A computer-implemented method, comprising: dividing a workload for a neural network processor into a plurality of partitions based on a quantity of neurons of the neural network processor; assigning a group of neurons of the neural network processor to each of the plurality of partitions to maximize a total number of the group of neurons that simultaneously process the workload while reducing power consumption; processing, by way of the group of neurons, the workload in an assigned partition to generate a partial output value by: performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and summing partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 14. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 15. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 16. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 17. The computer-implemented method of claim 13 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 18. The computer-implemented method of claim 13 , further comprising powering down the group of neurons following generation of the output values for the workload.

Assignees

Inventors

Classifications

  • Analogue means · CPC title

  • Combinations of networks · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • Virtual address space management · CPC title

  • Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11476869B2 cover?
A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partit…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).