What technology area does this patent fall under?

Primary CPC classification G06T1/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Methods of operating a graphics processing unit (GPU) to train a deep neural network using a GPU local memory and related articles of manufacture

Patent metadata
Field	Value
Publication number	US-11599798-B2
Application number	US-202016819840-A
Country	US
Kind code	B2
Filing date	Mar 16, 2020
Priority date	Mar 18, 2019
Publication date	Mar 7, 2023
Grant date	Mar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method operating a Graphics Processing Unit (GPU) memory can be provided by accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN. A sub-batch size of the samples can be defined that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples. Instructions configured to train the DNN using the sub-batch size can be defined so that an accuracy of the DNN trained using the sub-batch size is about equal to an accuracy of the DNN trained using the specified batch size of the samples.

First claim

Opening claim text (preview).

What is claimed: 1. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the method further comprises: prior to training the DNN, determining a static schedule of off-loading data to a host and data prefetching from the host for tasks to be used during training of the DNN based on a simulation of training the DNN; and applying the static schedule of off-loading data and data prefetching during the training of the DNN. 2. The method of claim 1 wherein generating the instructions comprises generating the instructions configured to accumulate all errors generated from training the DNN using the sub-batch size of the samples to complete the specified batch size of the samples to provide an error for the specified batch size. 3. The method of claim 1 wherein the sub-batch size of the samples is defined so that any 15% of consecutive tasks occurring in topographical order in a task flow data graph representing training of the DNN can be stored in the sub-batch size of the samples. 4. The method of claim 1 wherein sub-batch size of the samples greater than or equal to 1 and less than or equal to the specified batch size of samples. 5. The method of claim 1 wherein determining the static schedule comprises simulating execution of all tasks in a task flow data graph representing training of the DNN in topological order. 6. The method of claim 5 further comprising: selecting a convolution kernel for use by the GPU to train the DNN from among a plurality of convolution kernels based on a combination of a performance associated with each of the plurality of convolution kernels offset by an off-loading factor for each of the plurality of convolution kernels and a prefetch delay for each of the plurality of convolution kernels for a given size for each of the sub-batch size. 7. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the sub-batch size of the samples is defined using the following relationship: max 1 ≤ t ≤ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - α ⁢ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" + 1 { ∑ k = t t + α ⁢ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - 1 [ ∑ d ⁡ ( b ) ∈ I ⁡ (

Assignees

Univ Notre Dame Du Lac

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06T1/20Primary
Processor architectures; Processor configuration, e.g. pipelining · CPC title
G06N3/10
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
G06N3/063
using electronic means · CPC title

Patent family

Related publications grouped by family.

View patent family 72514609

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11599798B2 cover?: A method operating a Graphics Processing Unit (GPU) memory can be provided by accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN. A sub-batch size of the samples can be defined that is less than or equal to the spe…
Who is the assignee on this patent?: Univ Notre Dame Du Lac
What technology area does this patent fall under?: Primary CPC classification G06T1/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).