What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Deep learning model embodiments and training embodiments for faster training

US11144790B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11144790-B2
Application number	US-201916600148-A
Country	US
Kind code	B2
Filing date	Oct 11, 2019
Priority date	Oct 11, 2019
Publication date	Oct 12, 2021
Grant date	Oct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Presented herein are embodiments of a training deep learning models. In one or more embodiments, a compact deep learning model comprises fewer layers, which require fewer floating-point operations (FLOPs). Presented herein are also embodiments of a new learning rate function, which can adaptively change the learning rate between two linear functions. In one or more embodiments, combinations of half-precision floating point format training together with larger batch size in the training process may also be employed to aid the training process.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training an image classification model, the method comprising: forming one or more batches comprising images and their corresponding labels, the images and their corresponding labels being selected from one or more training datasets in which each image has a corresponding label: repeating, for each training epoch until a stop condition is reached, a set of steps comprising: inputting a batch into the image classification model, the image classification model comprising: a convolution module comprising a convolution with a set of filters, a batch normalization operation, and an activation operation; a first residual module comprising at least two convolution modules separated by a max pooling layer, in which each convolution module has its own set of filters; a second residual module comprising at least two convolution modules separated by a max pooling layer, in which each convolution module has its own set of filters; and a fully connected layer that receives an input obtained from an output of the second residual module; determining a loss for the image classification model given the predicted output for the batch; and updating one or more parameters of the image classification model using the loss. 2. The computer-implemented method of claim 1 further comprising: determining a learning rate for each training epoch. 3. The computer-implemented method of claim 2 wherein the step of determining a learning rate for each training epoch comprises: using a piecewise linear function that relates training epoch number to learning rate to determine the learning rate for a training epoch. 4. The computer-implemented method of claim 3 wherein the piecewise linear function comprises: a first linear section in which learning rate increases linearly from zero or near zero to a peak point as training epoch increases; and a second linear section in which learning rate decreases linearly from a peak point to near zero as training epoch increases, wherein the magnitude of the slope of the first linear section is larger than the magnitude of the slope of the second linear section. 5. The computer-implemented method of claim 1 wherein at least one of the residual modules comprise an increasing number of filters to increase feature representation of the image classification model. 6. The computer-implemented method of claim 1 wherein at least one of the first residual module and the second residual module further comprises two convolution modules after the max pooling layer. 7. The computer-implemented method of claim 1 further wherein the number of filters for a convolution is matched to processor unit parallel capabilities of a system used to train the image classification model. 8. The computer-implemented method of claim 1 wherein the number of images selected for a batch is determined such that a memory requirement of the batch is less than a memory limit of a processor unit used to train the image classification model. 9. A system for training an image classification model, the system comprising: one or more processors; and a non-transitory computer-readable medium or media comprising one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising: forming one or more batches comprising images and their corresponding labels, the images and their corresponding labels being selected from one or more training datasets in which each image has a corresponding label: repeating, for each training epoch until a stop condition is reached, a set of steps comprising: inputting a batch into the image classification model, the image classification model comprising: a convolution module comprising a convolution with a set of filters, a batch normalization operation, and an activation operation; a first residual module comprising at least two convolution modules separated by a max pooling layer, in which each convolution module has its own set of filters; a second residual module comprising at least two convolution modules separated by a max pooling layer, in which each convolution module has its own set of filters; and a fully connected layer that receives an input obtained from an output of the second residual module; determining a loss for the image classification model given the predicted output for the batch; and updating one or more parameters of the image classification model using the loss. 10. The system of claim 9 wherein the non-transitory computer-readable medium or media further comprises one or more sets of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising determining a learning rate for each training epoch. 11. The system of claim 10 wherein the step of determining a learning rate for each training epoch comprises: using a piecewise linear function that relates training epoch number to learning rate to determine the learning rate for a training epoch. 12. The system of claim 11 wherein the piecewise linear function comprises: a first linear section in which learning rate increases linearly from zero or near zero to a peak point as training epoch increases; and a second linear section in which learning rate decreases linearly from a peak point to near zero as training epoch increases, wherein the magnitude of the slope of the first linear section is larger than the magnitude of the slope of the second linear section. 13. The system of claim 9 wherein at least one of the residual modules comprise an increasing number of filters to increase feature representation of the image classification model. 14. The system of claim 9 wherein at least one of the first residual module and the second residual module further comprises two convolution modules after the max pooling layer. 15. The system of claim 9 wherein the number of images selected for a batch is determined such that a memory requirement of the batch is less than a memory limit of the at least one processor used to train the image classification model. 16. A computer-implemented method for classifying an image, the method comprising: inputting an input image into a classification model, the classification model comprising: a convolution module comprising a convolution with a set of filters, a batch normalization operation, and an activation operation; a first residual module comprising at least two convolution modules each with its own set of filters separated by a max pooling layer; a second residual module comprising at least two convolution modules separated by a max pooling layer; and a fully connected layer; and outputting a classification label for the input image. 17. The computer-implemented method of claim 16 wherein at least one of the first residual module and the second residual module further comprises two convolution modules after the max pooling layer. 18. The computer-implemented method of claim 16 wherein at least one of the first residual module and the second residual module further comprises: combining an output of the max pooling layer with an output of the last convolution module of the residual module. 19. The computer-implemented method of claim 16 further wherein at least some of the residual modules comprise an increasing number of filters to increase feature representation of the model. 20. The computer-implemented method of claim 16 wherein the number of filters for a convolutio

Assignees

Baidu Usa Llc

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06F18/24
Classification techniques · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06F18/241
relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

Patent family

Related publications grouped by family.

View patent family 75346022

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11144790B2 cover?: Presented herein are embodiments of a training deep learning models. In one or more embodiments, a compact deep learning model comprises fewer layers, which require fewer floating-point operations (FLOPs). Presented herein are also embodiments of a new learning rate function, which can adaptively change the learning rate between two linear functions. In one or more embodiments, combinations of …
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Hand pose estimation

Room layout estimation methods and techniques

Computationally-efficient quaternion-based machine-learning system

Structure defect detection using machine learning algorithms

Frequently asked questions