Coordinated heterogeneous processing of training data for deep neural networks
US-2019311257-A1 · Oct 10, 2019 · US
US11599798B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11599798-B2 |
| Application number | US-202016819840-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 16, 2020 |
| Priority date | Mar 18, 2019 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method operating a Graphics Processing Unit (GPU) memory can be provided by accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN. A sub-batch size of the samples can be defined that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples. Instructions configured to train the DNN using the sub-batch size can be defined so that an accuracy of the DNN trained using the sub-batch size is about equal to an accuracy of the DNN trained using the specified batch size of the samples.
Opening claim text (preview).
What is claimed: 1. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the method further comprises: prior to training the DNN, determining a static schedule of off-loading data to a host and data prefetching from the host for tasks to be used during training of the DNN based on a simulation of training the DNN; and applying the static schedule of off-loading data and data prefetching during the training of the DNN. 2. The method of claim 1 wherein generating the instructions comprises generating the instructions configured to accumulate all errors generated from training the DNN using the sub-batch size of the samples to complete the specified batch size of the samples to provide an error for the specified batch size. 3. The method of claim 1 wherein the sub-batch size of the samples is defined so that any 15% of consecutive tasks occurring in topographical order in a task flow data graph representing training of the DNN can be stored in the sub-batch size of the samples. 4. The method of claim 1 wherein sub-batch size of the samples greater than or equal to 1 and less than or equal to the specified batch size of samples. 5. The method of claim 1 wherein determining the static schedule comprises simulating execution of all tasks in a task flow data graph representing training of the DNN in topological order. 6. The method of claim 5 further comprising: selecting a convolution kernel for use by the GPU to train the DNN from among a plurality of convolution kernels based on a combination of a performance associated with each of the plurality of convolution kernels offset by an off-loading factor for each of the plurality of convolution kernels and a prefetch delay for each of the plurality of convolution kernels for a given size for each of the sub-batch size. 7. A method operating a Graphics Processing Unit (GPU), the method comprising: accessing specified training parameters used to train a Deep Neural Network (DNN) using a GPU with a local GPU memory, the specified training parameters including at least a specified batch size of samples configured to train the DNN; defining a sub-batch size of the samples that is less than or equal to the specified batch size of samples in response to determining that an available size of the local GPU memory is insufficient to store all data associated with training the DNN using one batch of the samples; and generating instructions configured to train the DNN using the sub-batch size so that an accuracy of the DNN trained using the sub-batch size is equal to an accuracy of the DNN trained using the specified batch size of the samples, wherein the sub-batch size of the samples is defined using the following relationship: max 1 ≤ t ≤ ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - α ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" + 1 { ∑ k = t t + α ❘ "\[LeftBracketingBar]" T ❘ "\[RightBracketingBar]" - 1 [ ∑ d ( b ) ∈ I (
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Processor architectures; Processor configuration, e.g. pipelining · CPC title
Interfaces, programming languages or software development kits, e.g. for simulating neural networks · CPC title
using electronic means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.