Who is the assignee on this patent?

Mitsubishi Electric Res Laboratories Inc

What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for detecting actions in videos

US10242266B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10242266-B2
Application number	US-201615058264-A
Country	US
Kind code	B2
Filing date	Mar 2, 2016
Priority date	Mar 2, 2016
Publication date	Mar 26, 2019
Grant date	Mar 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectories and cropped images for one or more images in the chunk are produced using the bounding box. Then, the cropped trajectories and cropped images are passed to a recurrent neural network (RNN) that outputs a relative score for each action of interest.

First claim

Opening claim text (preview).

We claim: 1. A method for detecting actions of an object in a scene, comprising steps: acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks; tracking the object in the video, and for each object and each chunk of the video, further comprising: determining trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers, wherein the steps are performed in a processor. 2. The method of claim 1 , wherein the convolutional neural network layers operate on multiple streams, including the cropped trajectories and the cropped images as well as trajectories and images that have an entire spatial extent of the video. 3. The method of claim 2 , wherein the recurrent neural network layers include bi-directional Long Short-Term Memory LSTM cells. 4. The method of claim 1 , wherein the recurrent neural network layers include Long Short-Term Memory (LSTM) cells. 5. The method of claim 1 , wherein the trajectories are encoded as pixel trajectories. 6. The method of claim 1 , wherein the trajectories are encodes as stacked optical flow. 7. The method of claim 1 , wherein the tracking includes selecting a bounding box that maximizes a magnitude of a stacked optical flow inside the bounding box. 8. The method of claim 7 , wherein the tracking further comprises: updating a location of the bounding box if a magnitude of the stacked optical flow inside the bounding box is greater than a threshold. 9. The method of claim 1 , wherein K is 3. 10. The method of claim 1 , wherein a motion pattern for each pixel is determined using a 1×2K convolutional kernel. 11. The method of claim 1 , wherein the method is used for fine-grained action detection in the video. 12. The method of claim 1 , wherein the method includes training the RNN prior to the detecting. 13. The method of claim 1 , wherein the RNN has been previously trained. 14. The method of claim 1 , wherein the detecting comprises temporal action detection. 15. The method of claim 1 , wherein the detecting comprises spatio-temporal action detection. 16. The method of claim 1 , wherein the video is initially acquired in some form other than a sequence of images, and is converted to a sequence of images. 17. The method of claim 1 , in which the object is one of a person, a robot or an industrial robot. 18. A system for detecting actions of an object in a scene, comprising: an input interface to acquire a video of the scene as a sequence of images from a video camera, wherein each image includes pixels, wherein the video is partitioned into chunks; and a processor configured to track the object in the video, and for each object and each chunk of the video further comprising: determine trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; use the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and pass the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers. 19. A method for detecting actions of an object in a scene, wherein the detecting includes detecting spatio-temporal action detection, comprising steps: acquiring a video of the scene as a sequence of images via an input interface, wherein each image includes pixels, wherein the video is partitioned into chunks; tracking the object in the video, and for each object and each chunk of the video, further comprising: determining trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers, the CNN layers operate on multiple streams, including the cropped trajectories and the cropped images as well as trajectories and images that have an entire spatial extent of the video, wherein the steps are performed in a processor in communication with the input interface.

Assignees

Mitsubishi Electric Res Laboratories Inc

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V20/52
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

View patent family 58228512

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10242266B2 cover?: A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectorie…
Who is the assignee on this patent?: Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).