Who is the assignee on this patent?

Beijing Sensetime Tech Development Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06N3/045. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu May 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Time domain action detecting methods and system, electronic devices, and computer storage medium

US2019138798A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2019138798-A1
Application number	US-201816234897-A
Country	US
Kind code	A1
Filing date	Dec 28, 2018
Priority date	Apr 20, 2017
Publication date	May 9, 2019
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Time domain action detecting methods and systems, electronic devices, and computer storage medium are provided. The method includes: obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval; separately extracting action features of at least two video segments in candidate segments, where the candidate segments comprises video segment corresponding to the time domain interval and adjacent segments thereof; pooling the action features of the at least two video segments in the candidate segments, to obtain a global feature of the video segment corresponding to the time domain interval; and determining, based on the global feature, an action integrity score of the video segment corresponding to the time domain interval. The embodiments of the present disclosure benefit accurately determining whether a time domain interval comprises an integral action instance, and improve the accuracy rate of action integrity identification.

First claim

Opening claim text (preview).

1 . A time domain action detecting method, the method comprising: obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval; separately extracting action features of at least two video segments in candidate segments, wherein the candidate segments comprise a video segment corresponding to the time domain interval and the adjacent segments thereof; pooling the action features of the at least two video segments, including a first video segment and a second video segment, in the candidate segments, to obtain a global feature of the video segment corresponding to the time domain interval; and determining, based on the global feature, an action integrity score of the video segment corresponding to the time domain interval. 2 . The method according to claim 1 , wherein the at least one adjacent segment comprises: at least one of a first adjacent segment in the video with a time sequence located in front of the time domain interval, or a second adjacent segment in the video with a time sequence located behind the time domain interval; and the first adjacent segment and the second adjacent segment respectively comprise at least one video segment. 3 . The method according to claim 1 , wherein the obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval comprises: performing actionness estimation separately on at least one video segment in the video, to obtain a time sequence actionness sequence; performing action position prediction based on the time sequence actionness sequence, to obtain the time domain interval in the video with an action instance, the time domain interval comprising a start time and an end time; and extracting, from the video, at least one of the first adjacent segment before the time domain interval or the second adjacent segment after the time domain interval. 4 . The method according to claim 3 , wherein the performing actionness estimation separately on at least one video segment in the video, to obtain a time sequence actionness sequence comprises: for any video segment in the video separately: extracting an image frame as an original image, and performing actionness estimation on the original image, to obtain a first actionness value; extracting a light stream of the any video segment, merging obtained light stream field pictures, to obtain a spliced light scream field image, and performing actionness estimation on the spliced light scream field image, to obtain a second actionness value; obtaining an actionness value of the any video segment from the first actionness value and the second actionness value; and forming the time sequence actionness sequence by the actionness values of all video segments based on a time sequence relation. 5 . The method according to claim 4 , wherein after the obtaining the actionness value of any video segment, the method further comprises: normalizing the actionness value of the any video segment, to obtain a normalized actionness value; and the time sequence actionness sequence comprising: a time sequence actionness sequence formed by the normalized actionness value. 6 . The method according to claim 1 , the method further comprising: obtaining, based on the action feature of the video segment corresponding to the time domain interval, a category score of at least one action category of the video segment corresponding to the time domain interval; and determining, according to the category score of the at least one action category of the video segment corresponding to the time domain interval, a detected action category of the video segment corresponding to the time domain interval. 7 . The method according to claim 6 , the method further comprising: outputting the time domain interval and the detected action category of the video segment corresponding to the time domain interval. 8 . The method according to claim 6 , wherein the obtaining, based on an action feature of the video segment corresponding to the time domain interval, a category score of at least one action category of the video segment corresponding to the time domain interval comprises: separately obtaining, based on the action feature of the at least one action category of the video segment corresponding to the time domain interval, a score of the at least one video segment corresponding to the time domain interval separately belonging to the at least action category; and summing scores of the at least one video segment corresponding to the time domain interval separately belonging to the same action category, to obtain the category score of the at least one action category of the video segment corresponding to the time domain interval. 9 . The method according to claim 1 , wherein the pooling the action features of the at least two video segments in the candidate segments comprises: performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments. 10 . The method according to claim 9 , wherein after the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: merging pooling features obtained after the time domain pyramid-typed pooling. 11 . The method according to claim 10 , wherein before the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: presetting a value of a number K of pooling layers to be 1; the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments comprising: for any first to-be-pooled segment with a value of a preset partition part number B K to be 1, obtaining the pooling feature of the any first to-be-pooled segment from the action feature of the at least one video segment in the any first to-be-pooled segment; for any second to-be-pooled segment with the value of the preset partition part number B K to be greater than 1, segmenting all video segments in the any second to-be-pooled segment into B K parts, obtaining the pooling feature of a corresponding part separately from the action features of each part of the video segments in the B K parts, and merging the pooling features of the B K parts, to obtain the pooling feature of the any second to-be-pooled segment; and the first to-be-pooled segment comprising the video segment corresponding to the time domain interval, any one or more of the first adjacent segment and the second adjacent segment; the second to-be-pooled segment comprising other to-be-pooled segments in the candidate segments except the first to-be-pooled segment. 12 . The method according to claim 10 , wherein before the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: presetting a value of a number K of pooling layers to be greater than 1; the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments comprising: separately for a k th pooling layer: for any first to-be-pooled segment with a value of a preset partition part number B K to be 1, obtaining the pooling feature of the any first to-be-pooled segment at the k th layer from the action feature of the at least one video segment in the any first to-be-pooled segment; for any second to-be-pooled seg

Assignees

Beijing Sensetime Tech Development Co Ltd

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V40/20
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
G06N3/048
Activation functions · CPC title
G06N3/045Primary
Combinations of networks · CPC title
G06F18/217
Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 62656586

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019138798A1 cover?: Time domain action detecting methods and systems, electronic devices, and computer storage medium are provided. The method includes: obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval; separately extracting action features of at least two video segments in candidate segments, where the candidate segments comprises vi…
Who is the assignee on this patent?: Beijing Sensetime Tech Development Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu May 09 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).