Methods of operating a graphics processing unit (GPU) to train a deep neural network using a GPU local memory and related articles of manufacture

US11599798B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11599798-B2
Application numberUS-202016819840-A
CountryUS
Kind codeB2
Filing dateMar 16, 2020
Priority dateMar 18, 2019
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method operating a Graphics Processing Unit (GPU) memory can be provided by accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN. A sub-batch size of the samples can be defined that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples. Instructions configured to train the DNN using the sub-batch size can be defined so that an accuracy of the DNN trained using the sub-batch size is about equal to an accuracy of the DNN trained using the specified batch size of the samples.

First claim

Opening claim text (preview).

What is claimed: 1. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the method further comprises: prior to training the DNN, determining a static schedule of off-loading data to a host and data prefetching from the host for tasks to be used during training of the DNN based on a simulation of training the DNN; and applying the static schedule of off-loading data and data prefetching during the training of the DNN. 2. The method of claim 1 wherein generating the instructions comprises generating the instructions configured to accumulate all errors generated from training the DNN using the sub-batch size of the samples to complete the specified batch size of the samples to provide an error for the specified batch size. 3. The method of claim 1 wherein the sub-batch size of the samples is defined so that any 15% of consecutive tasks occurring in topographical order in a task flow data graph representing training of the DNN can be stored in the sub-batch size of the samples. 4. The method of claim 1 wherein sub-batch size of the samples greater than or equal to 1 and less than or equal to the specified batch size of samples. 5. The method of claim 1 wherein determining the static schedule comprises simulating execution of all tasks in a task flow data graph representing training of the DNN in topological order. 6. The method of claim 5 further comprising: selecting a convolution kernel for use by the GPU to train the DNN from among a plurality of convolution kernels based on a combination of a performance associated with each of the plurality of convolution kernels offset by an off-loading factor for each of the plurality of convolution kernels and a prefetch delay for each of the plurality of convolution kernels for a given size for each of the sub-batch size. 7. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the sub-batch size of the samples is defined using the following relationship: max 1 ≤ t ≤ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - α ⁢ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" + 1 { ∑ k = t t + α ⁢ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - 1 [ ∑ d ⁡ ( b ) ∈ I ⁡ (

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • G06T1/20Primary

    Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599798B2 cover?
A method operating a Graphics Processing Unit (GPU) memory can be provided by accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN. A sub-batch size of the samples can be defined that is less than or equal to the spe…
Who is the assignee on this patent?
Univ Notre Dame Du Lac
What technology area does this patent fall under?
Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).