What technology area does this patent fall under?

Primary CPC classification G06N3/082. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Transforming convolutional neural networks for visual sequence learning

US11049018B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11049018-B2
Application number	US-201815880472-A
Country	US
Kind code	B2
Filing date	Jan 25, 2018
Priority date	Jun 23, 2017
Publication date	Jun 29, 2021
Grant date	Jun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: replacing a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; setting hidden-to-hidden weights of the recurrent layer to initial values; and processing video image data by the visual sequence learning neural network model to generate classification or regression output data. 2. The method of claim 1 , prior to processing the video image data, further comprising: processing input video image data included in a training dataset by the visual sequence learning neural network model to generate output data; comparing the output data to target output data included in the training dataset to produce comparison results; and adjusting the hidden-to-hidden weights based on the comparison results. 3. The method of claim 2 , further comprising adjusting the input-to-hidden weights based on the comparison results. 4. The method of claim 2 , wherein the training dataset is configured for sequential face alignment and the video image data is color data. 5. The method of claim 2 , wherein the training dataset is configured for dynamic hand gesture recognition and the video image data is color data and depth data. 6. The method of claim 2 , wherein the training dataset is configured for action recognition and the video image data is color data and optical flow data. 7. The method of claim 1 , wherein the non-recurrent layer is a fully-connected layer. 8. The method of claim 1 , wherein the non-recurrent layer is a convolutional layer. 9. The method of claim 1 , wherein the transforming comprises computing values of parameters for multiple input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 10. The method of claim 1 , wherein the transforming comprises computing values of parameters for a unified input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 11. The method of claim 1 , wherein the replacing comprises selecting the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer. 12. The method of claim 11 , wherein fewer activation values for the neurons in the recurrent layer are distributed between 0.1 and 0.9 than are distributed outside of 0.1 and 0.9 within a range 0.0 to 1.0. 13. A system, comprising: a memory storing video image data; and a parallel processing unit that is coupled to the memory and configured to: replace a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transform feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; set hidden-to-hidden weights of the recurrent layer to initial values; and process the video image data by the visual sequence learning neural network model to generate classification or regression output data. 14. The system of claim 13 , wherein the parallel processing unit is further configured, prior to processing the video image data, to: process input video image data included in a training dataset by the visual sequence learning neural network model to generate output data; compare the output data to target output data included in the training dataset to produce comparison results; and adjust the hidden-to-hidden weights based on the comparison results. 15. The system of claim 14 , wherein the parallel processing unit is further configured to adjust the input-to-hidden weights based on the comparison results. 16. The system of claim 13 , wherein the parallel processing unit is further configured to compute values for multiple input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 17. The system of claim 13 , wherein the parallel processing unit is further configured to compute values for a unified input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 18. The system of claim 13 , wherein the parallel processing unit is further configured to select the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer to transform the feedforward weights. 19. A non-transitory computer-readable media storing computer instructions for visual sequence learning that, when executed by a processor, cause the processor to perform the steps of: replacing a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; setting hidden-to-hidden weights of the recurrent layer to initial values; and processing video image data by the visual sequence learning neural network model to generate classification or regression output data. 20. The non-transitory computer-readable media of claim 19 , wherein the replacing comprises selecting the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer.

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/048
Activation functions · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06F18/24
Classification techniques · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

View patent family 64692635

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11049018B2 cover?: A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating simulated images from input images for semiconductor applications

Online detection and classification of dynamic gestures with recurrent convolutional neural networks

Method and system for joint training of hybrid neural networks for acoustic modeling in automatic speech recognition

Systems and methods for video paragraph captioning using hierarchical recurrent neural networks

Face detection

Image recognition method

Frequently asked questions