What technology area does this patent fall under?

Primary CPC classification G06V40/23. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Recurrent networks with motion-based attention for video understanding

US10049279B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10049279-B2
Application number	US-201615267621-A
Country	US
Kind code	B2
Filing date	Sep 16, 2016
Priority date	Mar 11, 2016
Publication date	Aug 14, 2018
Grant date	Aug 14, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of predicting action labels for a video stream includes receiving the video stream and calculating an optical flow of consecutive frames of the video stream. An attention map is generated from the current frame of the video stream and the calculated optical flow. An action label is predicted for the current frame based on the optical flow, a previous hidden state and the attention map.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of predicting action labels for a video stream, comprising: receiving the video stream; calculating an optical flow of a current frame and a next frame of the video stream; generating an attention map from the current frame of the video stream, a first previous hidden state from a first layer of an artificial neural network, a second previous hidden state from a second layer of the artificial neural network, and the calculated optical flow; and predicting an action label for the current frame based on the optical flow, the second previous hidden state, and the attention map. 2. The method of claim 1 , further comprising: calculating a two-dimensional (2D) or three-dimensional (3D) feature map from the current frame of the video stream and the attention map; and predicting a second action label for the next frame based on the optical flow, the 2D or 3D feature map, and the attention map. 3. The method of claim 2 , in which the 2D or 3D feature map is based on one or more of a frame appearance, the optical flow, a spectrogram image, or semantic segmentation. 4. The method of claim 1 , further comprising predicting the action label with a recurrent neural network (RNN). 5. The method of claim 4 , in which the RNN comprises a long short-term memory (LSTM) network. 6. An apparatus for predicting action labels for a video stream, comprising: a memory; and at least one processor coupled to the memory, the at least one processor configured: to receive the video stream; to calculate an optical flow of a current frame and a next frame of the video stream; to generate an attention map from the current frame of the video stream, a first previous hidden state from a first layer of an artificial neural network, a second previous hidden state from a second layer of the artificial neural network, and the calculated optical flow; and to predict an action label for the current frame based on the optical flow, the second previous hidden state, and the attention map. 7. The apparatus of claim 6 , in which the at least one processor is further configured: to calculate a two-dimensional (2D) or three-dimensional (3D) feature map from the current frame of the video stream and the attention map; and to predict a second action label for the next frame based on the optical flow, the 2D or 3D feature map, and the attention map. 8. The apparatus of claim 7 , in which the 2D or 3D feature map is based on one or more of a frame appearance, the optical flow, a spectrogram image, or semantic segmentation. 9. The apparatus of claim 6 , in which the at least one processor is further configured to predict the action label with a recurrent neural network (RNN). 10. The apparatus of claim 9 , in which the RNN comprises a long short-term memory (LSTM) network. 11. An apparatus for predicting action labels for a video stream, comprising: means for receiving the video stream; means for calculating an optical flow of a current frame and a next frame of the video stream; means for generating an attention map from the current frame of the video stream, a first previous hidden state from a first layer of an artificial neural network, a second previous hidden state from a second layer of the artificial neural network, and the calculated optical flow; and means for predicting an action label for the current frame based on the optical flow, the second previous hidden state, and the attention map. 12. The apparatus of claim 11 , further comprising: means for calculating a two-dimensional (2D) or three-dimensional (3D) feature map from the current frame of the video stream and the attention map; and means for predicting a second action label for the next frame based on the optical flow, the 2D or 3D feature map, and the attention map. 13. The apparatus of claim 12 , in which the 2D or 3D feature map is based on one or more of a frame appearance, the optical flow, a spectrogram image, or semantic segmentation. 14. The apparatus of claim 11 , further comprising means for predicting the action label with a recurrent neural network (RNN). 15. The apparatus of claim 14 , in which the RNN comprises a long short-term memory (LSTM) network. 16. A non-transitory computer-readable medium having program code recorded thereon for predicting action labels for a video stream, the program code being executed by a processor and comprising: program code to receive the video stream; program code to calculate an optical flow of a current frame and a next frame of the video stream; program code to generate an attention map from the current frame of the video stream, a first previous hidden state from a first layer of an artificial neural network, a second previous hidden state from a second layer of the artificial neural network, and the calculated optical flow; and program code to predict an action label for the current frame based on the optical flow, the second previous hidden state, and the attention map. 17. The non-transitory computer-readable medium of claim 16 , further comprising: program code to calculate a two-dimensional (2D) or three-dimensional (3D) feature map from the current frame of the video stream and the attention map; and program code to predict a second action label for the next frame based on the optical flow, the 2D or 3D feature map, and the attention map. 18. The non-transitory computer-readable medium of claim 17 , in which the 2D or 3D feature map is based on one or more of a frame appearance, the optical flow, a spectrogram image, or semantic segmentation. 19. The non-transitory computer-readable medium of claim 16 , further comprising program code to predict the action label with a recurrent neural network (RNN). 20. The non-transitory computer-readable medium of claim 19 , in which the RNN comprises a long short-term memory (LSTM) network.

Assignees

Qualcomm Inc

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V20/41
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06V40/23Primary
Recognition of whole body movements, e.g. for sport training · CPC title
G06N3/044Primary
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 59786876

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10049279B2 cover?: A method of predicting action labels for a video stream includes receiving the video stream and calculating an optical flow of consecutive frames of the video stream. An attention map is generated from the current frame of the video stream and the calculated optical flow. An action label is predicted for the current frame based on the optical flow, a previous hidden state and the attention map.
Who is the assignee on this patent?: Qualcomm Inc
What technology area does this patent fall under?: Primary CPC classification G06V40/23. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 14 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).