What technology area does this patent fall under?

Primary CPC classification G06K9/00751. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Devices, systems, and methods for generating a temporal-adaptive representation for video-event classification

US10068138B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10068138-B2
Application number	US-201615189996-A
Country	US
Kind code	B2
Filing date	Jun 22, 2016
Priority date	Sep 17, 2015
Publication date	Sep 4, 2018
Grant date	Sep 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Devices, systems, and methods for computer recognition of action in video obtain frame-level feature sets of visual features that were extracted from respective frames of a video, wherein the respective frame-level feature set of a frame includes the respective visual features that were extracted from the frame; generate first-level feature sets, wherein each first-level feature set is generated by pooling the visual features from two or more frame-level feature sets, and wherein each first-level feature set includes pooled features; and generate second-level feature sets, wherein each second-level feature set is generated by pooling the pooled features in two or more first-level feature sets, wherein each second-level feature set includes pooled features.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: one or more computer-readable media; and one or more processors that are coupled to the computer-readable media and that are configured to cause the device to obtain frame-level feature sets of visual features that were extracted from respective frames of a video, wherein the respective frame-level feature set of a frame includes the respective visual features that were extracted from the frame, generate first-pooled-level feature sets, wherein each first-pooled-level feature set is generated by pooling the visual features from two or more frame-level feature sets, wherein each first-pooled-level feature set includes pooled features, and wherein a pooled feature in the first-pooled-level feature sets is generated by pooling two or more visual features, generate second-pooled-level feature sets, wherein each second-pooled-level feature set is generated by pooling the pooled features in two or more first-pooled-level feature sets, wherein each second-pooled-level feature set includes pooled features, and wherein a pooled feature in the second-pooled-level feature sets is generated by pooling two or more pooled features in the first-pooled-level feature sets, obtain trajectory features that were extracted from the video; fuse the trajectory features with at least some of the pooled features in the first-pooled-level feature sets and the pooled features in the second-pooled-level feature sets, thereby generating fused features; train classifiers based on the fused features; obtain a test video; and classify the test video using the trained classifiers. 2. The device of claim 1 , wherein the frames of the video are arranged in a temporal order, and wherein the frame-level feature sets, the first-pooled-level feature sets, and the second-pooled-level feature sets maintain the temporal order. 3. The device of claim 2 , wherein pooling the visual features from two or more frame-level feature sets includes pooling the respective frame-level feature sets of frames that are adjacent to each other in the temporal order. 4. The device of claim 1 , wherein each first-pooled-level feature set is generated by pooling the visual features from only two frame-level feature sets. 5. The device of claim 1 , wherein each first-pooled-level feature set is generated by pooling the visual features from three or more frame-level feature sets. 6. The device of claim 1 , wherein the one or more processors are further configured to cause the device to generate third-pooled-level feature sets, wherein each third-pooled-level feature set is generated by pooling the pooled features in two or more second-pooled-level feature sets, wherein each third-pooled-level feature sets includes pooled features. 7. The device of claim 1 , wherein the first-pooled-level feature sets describe the frames of the video in a first temporal scale, the second-pooled-level feature sets describe the frames of the video in a second temporal scale, and the first temporal scale is different from the second temporal scale. 8. The device of claim 7 , where the frame-level feature sets describe the frames of the video in a third temporal scale that is different from both the first temporal scale and the second temporal scale. 9. The device of claim 1 , wherein the pooling of two or more visual features uses minimum pooling, maximum pooling, or average pooling. 10. A method comprising: obtaining frame-level feature sets of visual features that were extracted from respective frames of a video, wherein the respective frame-level feature set of each frame includes the respective visual features that were extracted from the frame; pooling the visual features from a first group of two or more frame-level feature sets, thereby generating a first first-level feature set, wherein the first first-level feature set includes pooled features, and wherein at least some of the pooled features in the first first-level feature set were each generated by pooling two or more respective visual features; pooling the visual features from a second group of two or more frame-level feature sets, thereby generating a second first-level feature set, wherein the second first-level feature set includes pooled features, wherein the second group of two or more frame-level feature sets includes a least one feature set that is not included in the first group of two or more frame-level feature sets, and wherein at least some of the pooled features in the second first-level feature set were each generated by pooling two or more respective visual features; pooling the pooled features in the first first-level feature set and the second first-level feature set, thereby generating a first second-level feature set, wherein the first second-level feature set includes pooled features that were pooled from the pooled features in first first-level feature set and from the pooled features in the second first-level feature set, and wherein at least some of the pooled features in the first second-level feature set were each generated by pooling at least one respective pooled feature in the first first-level feature set with at least one respective pooled feature in the second first-level feature set; training first classifiers based on one or more of the pooled features in the first first-level feature set, on the pooled features in the second first-level feature set, and on the pooled features in the first second-level feature set; obtaining trajectory features that were extracted from the video; training second classifiers based on the trajectory features; generating combined classifiers based on the first classifiers and the second classifiers; obtaining a second video; and classifying the second video using the combined classifiers. 11. The method of claim 10 , wherein the pooling is average pooling or max pooling. 12. The method of claim 10 , wherein training the first classifiers is further based on the visual features in the frame-level feature sets. 13. One or more computer-readable storage media storing instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations comprising: obtaining frame-level feature sets of visual features that were extracted from respective frames of a video, wherein the respective frame-level feature set of each frame includes the respective visual features that were extracted from the frame; pooling the visual features from a first group of two or more frame-level feature sets, thereby generating a first lower-pooled-level feature set, wherein the first lower-pooled-level feature set includes pooled features that were aggregated from the respective visual features of different frames, and wherein at least some of the pooled features in the first lower-pooled-level feature set were each generated by pooling two or more visual features into a single pooled feature; pooling the visual features from a second group of two or more frame-level feature sets, thereby generating a second lower-pooled-level feature set, wherein the second lower-pooled-level feature set includes pooled features that were aggregated from the respective visual features of different frames, and wherein at least some of the pooled features in the second lower-pooled-level feature set were each generated by pooling two or more visual features into a single pooled feature; pooling the pooled features in the first lower-pooled-level feature set and the second lower-pooled-level feature set, thereby generating a higher-pooled-level feature set, wherein the higher-pooled-level feature set includes pooled features that were aggregated from t

Assignees

Canon Kk

Inventors

Classifications

G06F16/783
using metadata automatically derived from the content · CPC title
G06F17/30784
Physics · mapped topic
G06K9/00751Primary
Physics · mapped topic
G06V20/47Primary
Detecting features for summarising video content · CPC title

Patent family

Related publications grouped by family.

View patent family 58282504

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10068138B2 cover?: Devices, systems, and methods for computer recognition of action in video obtain frame-level feature sets of visual features that were extracted from respective frames of a video, wherein the respective frame-level feature set of a frame includes the respective visual features that were extracted from the frame; generate first-level feature sets, wherein each first-level feature set is generate…
Who is the assignee on this patent?: Canon Kk
What technology area does this patent fall under?: Primary CPC classification G06K9/00751. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).