Split packet transmission DMA engine
US-9990307-B1 · Jun 5, 2018 · US
US11476869B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11476869-B2 |
| Application number | US-201815953330-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 13, 2018 |
| Priority date | Apr 17, 2017 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A deep neural network (DNN) module is disclosed that can dynamically partition neuron workload to reduce power consumption. The DNN module includes neurons and a group partitioner and scheduler unit. The group partitioner and scheduler unit divides a workload for the neurons into partitions in order to maximize the number of neurons that can simultaneously process the workload. The group partitioner and scheduler unit then assigns a group of neurons to each of the partitions. The groups of neurons in the DNN module process the workload in their assigned partition to generate a partial output value. The neurons in each group can then sum their partial output values to generate a final output value for the workload. The neurons can be powered down once the groups of neurons have completed processing their assigned workload to reduce power consumption.
Opening claim text (preview).
What is claimed is: 1. A neural network processor, comprising: a plurality of neurons; and a group partitioner and scheduler configured to: divide a workload for the neural network processor into a plurality of partitions based on a quantity of the plurality of neurons, and assign a group of the neurons to each of the plurality of partitions to maximize a total number of the plurality of neurons that simultaneously process the workload while reducing power consumption; and wherein the neurons within each group of neurons are configured to: process the workload in an assigned partition to generate a partial output value by performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 2. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 3. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 4. The neural network processor of claim 1 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 5. The neural network processor of claim 1 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 6. The neural network processor of claim 1 , wherein the plurality of neurons are powered down following generation of the output values for the workload. 7. A neural network processor, comprising: a buffer storing an input volume and a weight volume; a plurality of neurons; and a group partitioner and scheduler configured to partition the input volume and the weight volume into a plurality of partitions based on a quantity of the plurality of neurons, and assign a group of the neurons to each of the plurality of partitions to maximize a total number of the plurality of neurons that simultaneously process a workload while reducing power consumption; and wherein the neurons within each group of neurons are configured to: process the workload in an assigned partition to generate a partial output value by performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and sum partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 8. The neural network processor of claim 7 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 9. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 10. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 11. The neural network processor of claim 7 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 12. The neural network processor of claim 7 , wherein the plurality of neurons are powered down following generation of the output values for the workload. 13. A computer-implemented method, comprising: dividing a workload for a neural network processor into a plurality of partitions based on a quantity of neurons of the neural network processor; assigning a group of neurons of the neural network processor to each of the plurality of partitions to maximize a total number of the group of neurons that simultaneously process the workload while reducing power consumption; processing, by way of the group of neurons, the workload in an assigned partition to generate a partial output value by: performing a convolution operation on a partition containing a portion of an input volume and a portion of a weight volume where the partition comprises an input frame defined by a set of kernels, a number of channels per kernel, a height, and a width, and performing the convolution operation on overlapping intervals defined by strides in two dimensions; and summing partial output values generated by the neurons in each group of neurons to generate an output value for the workload. 14. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the depth dimension. 15. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the height dimension. 16. The computer-implemented method of claim 13 , wherein the workload comprises an input volume and a weight volume having height, width, and depth dimensions, and wherein the workload is partitioned along the width dimension. 17. The computer-implemented method of claim 13 , wherein the workload is divided into a plurality of partitions such that the number of neurons that can simultaneously process the workload is maximized. 18. The computer-implemented method of claim 13 , further comprising powering down the group of neurons following generation of the output values for the workload.
Analogue means · CPC title
Combinations of networks · CPC title
Architecture, e.g. interconnection topology · CPC title
Virtual address space management · CPC title
Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.