Dynamically resizing minibatch in neural network execution

US11354573B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11354573-B2
Application numberUS-201916362945-A
CountryUS
Kind codeB2
Filing dateMar 25, 2019
Priority dateMar 25, 2019
Publication dateJun 7, 2022
Grant dateJun 7, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: dynamically resizing a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 2. The method of claim 1 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 3. The method of claim 1 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 4. The method of claim 1 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 5. The method of claim 1 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by the size of the resized minibatch times. 6. The method of claim 1 , wherein the neural network is a deep neural network. 7. The method of claim 1 , wherein the on-chip memory is a hardware accelerator's on-chip memory. 8. A system comprising: at least one hardware processor; a memory device coupled with the hardware processor; the at least one hardware processor operable to at least dynamically resize a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 9. The system of claim 8 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 10. The system of claim 8 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 11. The system of claim 8 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 12. The system of claim 8 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by resized minibatch size times. 13. The system of claim 8 , wherein the neural network is a deep neural network. 14. The system of claim 8 , wherein the on-chip memory is a hardware accelerator's on-chip memory. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a device to cause the device to: dynamically resize a minibatch in a neural network execution, wherein a size of the minibatch is configured such that the minibatch fits within on-chip memory, wherein the size is further configured based on a potential improvement in utilization due to bandwidth saved from holding a resized minibatch on-chip, a potential drop in utilization due to weight and kernel reuse applicable only to the resized minibatch and not the minibatch in entirety, and a cost of forced write of a neural network layer's output to an external memory responsive to adjacent layers using different minibatch sizes, wherein the size of the minibatch is resized for a sequence of layers in the neural network execution, wherein a next layer's execution commences responsive to the resized minibatch being completed in a previous layer without having to wait for all of the minibatch to be completed in the previous layer. 16. The computer program product of claim 15 , wherein the size of the minibatch is resized to a maximum value that fits within the on-chip memory. 17. The computer program product of claim 15 , wherein the size of the minibatch is resized differently for a different layer in the neural network execution. 18. The computer program product of claim 15 , wherein the next layer's execution accesses the resized minibatch residing on the on-chip memory. 19. The computer program product of claim 15 , wherein the previous layer and the next layer are layers of a sequence of layers, wherein processing of the previous layer and the next layer are repeated for minibatch size divided by resized minibatch size times. 20. The computer program product of claim 15 , wherein the minibatch is resized so that the previous layer can store all of the previous layer's outputs on the on-chip memory for the next layer's execution to access locally without needing to access external memory external to the on-chip memory.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • using electronic means · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11354573B2 cover?
A minibatch in a neural network execution may be dynamically resized based on on-chip memory. For example, a size of the minibatch is configured such that the minibatch fits within on-chip memory. The size of the minibatch may be resized for a sequence of layers in the neural network execution. A next layer's execution can commence responsive to the resized minibatch being completed in a previo…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 07 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).