Dynamic minibatch sizes
US-2020226424-A1 · Jul 16, 2020 · US
US11354573B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11354573-B2 |
| Application number | US-201916362945-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 25, 2019 |
| Priority date | Mar 25, 2019 |
| Publication date | Jun 7, 2022 |
| Grant date | Jun 7, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.
Opening claim text (preview).
What is claimed is: 1. A method comprising: dynamically resizing a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 2. The method of claim 1 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 3. The method of claim 1 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 4. The method of claim 1 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 5. The method of claim 1 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by the size of the resized minibatch times. 6. The method of claim 1 , wherein the neural network is a deep neural network. 7. The method of claim 1 , wherein the on-chip memory is a hardware accelerator's on-chip memory. 8. A system comprising: at least one hardware processor; a memory device coupled with the hardware processor; the at least one hardware processor operable to at least dynamically resize a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 9. The system of claim 8 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 10. The system of claim 8 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 11. The system of claim 8 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 12. The system of claim 8 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by resized minibatch size times. 13. The system of claim 8 , wherein the neural network is a deep neural network. 14. The system of claim 8 , wherein the on-chip memory is a hardware accelerator's on-chip memory. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: dynamically resize a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 16. The computer program product of claim 15 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 17. The computer program product of claim 15 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 18. The computer program product of claim 15 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 19. The computer program product of claim 15 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by resized minibatch size times. 20. The computer program product of claim 15 , wherein the minibatch is resized so that the previous layer can store all of the previous layer's outputs on the on-chip memory for the next layer's execution to access locally without needing to access external memory external to the on-chip memory.
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
using electronic means · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.