Method and system for detecting actions in videos

US10242266B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10242266-B2
Application numberUS-201615058264-A
CountryUS
Kind codeB2
Filing dateMar 2, 2016
Priority dateMar 2, 2016
Publication dateMar 26, 2019
Grant dateMar 26, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectories and cropped images for one or more images in the chunk are produced using the bounding box. Then, the cropped trajectories and cropped images are passed to a recurrent neural network (RNN) that outputs a relative score for each action of interest.

First claim

Opening claim text (preview).

We claim: 1. A method for detecting actions of an object in a scene, comprising steps: acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks; tracking the object in the video, and for each object and each chunk of the video, further comprising: determining trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers, wherein the steps are performed in a processor. 2. The method of claim 1 , wherein the convolutional neural network layers operate on multiple streams, including the cropped trajectories and the cropped images as well as trajectories and images that have an entire spatial extent of the video. 3. The method of claim 2 , wherein the recurrent neural network layers include bi-directional Long Short-Term Memory LSTM cells. 4. The method of claim 1 , wherein the recurrent neural network layers include Long Short-Term Memory (LSTM) cells. 5. The method of claim 1 , wherein the trajectories are encoded as pixel trajectories. 6. The method of claim 1 , wherein the trajectories are encodes as stacked optical flow. 7. The method of claim 1 , wherein the tracking includes selecting a bounding box that maximizes a magnitude of a stacked optical flow inside the bounding box. 8. The method of claim 7 , wherein the tracking further comprises: updating a location of the bounding box if a magnitude of the stacked optical flow inside the bounding box is greater than a threshold. 9. The method of claim 1 , wherein K is 3. 10. The method of claim 1 , wherein a motion pattern for each pixel is determined using a 1×2K convolutional kernel. 11. The method of claim 1 , wherein the method is used for fine-grained action detection in the video. 12. The method of claim 1 , wherein the method includes training the RNN prior to the detecting. 13. The method of claim 1 , wherein the RNN has been previously trained. 14. The method of claim 1 , wherein the detecting comprises temporal action detection. 15. The method of claim 1 , wherein the detecting comprises spatio-temporal action detection. 16. The method of claim 1 , wherein the video is initially acquired in some form other than a sequence of images, and is converted to a sequence of images. 17. The method of claim 1 , in which the object is one of a person, a robot or an industrial robot. 18. A system for detecting actions of an object in a scene, comprising: an input interface to acquire a video of the scene as a sequence of images from a video camera, wherein each image includes pixels, wherein the video is partitioned into chunks; and a processor configured to track the object in the video, and for each object and each chunk of the video further comprising: determine trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; use the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and pass the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers. 19. A method for detecting actions of an object in a scene, wherein the detecting includes detecting spatio-temporal action detection, comprising steps: acquiring a video of the scene as a sequence of images via an input interface, wherein each image includes pixels, wherein the video is partitioned into chunks; tracking the object in the video, and for each object and each chunk of the video, further comprising: determining trajectories of the pixels within a bounding box located over the object, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images; using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the RNN includes convolutional neural network (CNN) layers and one or more recurrent neural network layers, the CNN layers operate on multiple streams, including the cropped trajectories and the cropped images as well as trajectories and images that have an entire spatial extent of the video, wherein the steps are performed in a processor in communication with the input interface.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10242266B2 cover?
A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectorie…
Who is the assignee on this patent?
Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).