Transforming convolutional neural networks for visual sequence learning

US11049018B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11049018-B2
Application numberUS-201815880472-A
CountryUS
Kind codeB2
Filing dateJan 25, 2018
Priority dateJun 23, 2017
Publication dateJun 29, 2021
Grant dateJun 29, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer. The method also includes the steps of setting hidden-to-hidden weights of the recurrent layer to initial values and processing video image data by the visual sequence learning neural network model to generate classification or regression output data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: replacing a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; setting hidden-to-hidden weights of the recurrent layer to initial values; and processing video image data by the visual sequence learning neural network model to generate classification or regression output data. 2. The method of claim 1 , prior to processing the video image data, further comprising: processing input video image data included in a training dataset by the visual sequence learning neural network model to generate output data; comparing the output data to target output data included in the training dataset to produce comparison results; and adjusting the hidden-to-hidden weights based on the comparison results. 3. The method of claim 2 , further comprising adjusting the input-to-hidden weights based on the comparison results. 4. The method of claim 2 , wherein the training dataset is configured for sequential face alignment and the video image data is color data. 5. The method of claim 2 , wherein the training dataset is configured for dynamic hand gesture recognition and the video image data is color data and depth data. 6. The method of claim 2 , wherein the training dataset is configured for action recognition and the video image data is color data and optical flow data. 7. The method of claim 1 , wherein the non-recurrent layer is a fully-connected layer. 8. The method of claim 1 , wherein the non-recurrent layer is a convolutional layer. 9. The method of claim 1 , wherein the transforming comprises computing values of parameters for multiple input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 10. The method of claim 1 , wherein the transforming comprises computing values of parameters for a unified input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 11. The method of claim 1 , wherein the replacing comprises selecting the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer. 12. The method of claim 11 , wherein fewer activation values for the neurons in the recurrent layer are distributed between 0.1 and 0.9 than are distributed outside of 0.1 and 0.9 within a range 0.0 to 1.0. 13. A system, comprising: a memory storing video image data; and a parallel processing unit that is coupled to the memory and configured to: replace a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transform feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; set hidden-to-hidden weights of the recurrent layer to initial values; and process the video image data by the visual sequence learning neural network model to generate classification or regression output data. 14. The system of claim 13 , wherein the parallel processing unit is further configured, prior to processing the video image data, to: process input video image data included in a training dataset by the visual sequence learning neural network model to generate output data; compare the output data to target output data included in the training dataset to produce comparison results; and adjust the hidden-to-hidden weights based on the comparison results. 15. The system of claim 14 , wherein the parallel processing unit is further configured to adjust the input-to-hidden weights based on the comparison results. 16. The system of claim 13 , wherein the parallel processing unit is further configured to compute values for multiple input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 17. The system of claim 13 , wherein the parallel processing unit is further configured to compute values for a unified input-to-hidden state corresponding to multiple gating functions of the recurrent layer using the feedforward weights. 18. The system of claim 13 , wherein the parallel processing unit is further configured to select the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer to transform the feedforward weights. 19. A non-transitory computer-readable media storing computer instructions for visual sequence learning that, when executed by a processor, cause the processor to perform the steps of: replacing a non-recurrent layer within a trained neural network model with a recurrent layer to produce a visual sequence learning neural network model; transforming feedforward weights for the non-recurrent layer into input-to-hidden weights of the recurrent layer to produce a transformed recurrent layer; setting hidden-to-hidden weights of the recurrent layer to initial values; and processing video image data by the visual sequence learning neural network model to generate classification or regression output data. 20. The non-transitory computer-readable media of claim 19 , wherein the replacing comprises selecting the non-recurrent layer based on a distribution of activation values for neurons in the transformed recurrent layer.

Assignees

Inventors

Classifications

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Classification techniques · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11049018B2 cover?
A method, computer readable medium, and system are disclosed for visual sequence learning using neural networks. The method includes the steps of replacing a non-recurrent layer within a trained convolutional neural network model with a recurrent layer to produce a visual sequence learning neural network model and transforming feedforward weights for the non-recurrent layer into input-to-hidden…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/082. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).