Synthetic-to-realistic image conversion using generative adversarial network (gan) or other machine learning model
US-2024428568-A1 · Dec 26, 2024 · US
US2019138798A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019138798-A1 |
| Application number | US-201816234897-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 28, 2018 |
| Priority date | Apr 20, 2017 |
| Publication date | May 9, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Time domain action detecting methods and systems, electronic devices, and computer storage medium are provided. The method includes: obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval; separately extracting action features of at least two video segments in candidate segments, where the candidate segments comprises video segment corresponding to the time domain interval and adjacent segments thereof; pooling the action features of the at least two video segments in the candidate segments, to obtain a global feature of the video segment corresponding to the time domain interval; and determining, based on the global feature, an action integrity score of the video segment corresponding to the time domain interval. The embodiments of the present disclosure benefit accurately determining whether a time domain interval comprises an integral action instance, and improve the accuracy rate of action integrity identification.
Opening claim text (preview).
1 . A time domain action detecting method, the method comprising: obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval; separately extracting action features of at least two video segments in candidate segments, wherein the candidate segments comprise a video segment corresponding to the time domain interval and the adjacent segments thereof; pooling the action features of the at least two video segments, including a first video segment and a second video segment, in the candidate segments, to obtain a global feature of the video segment corresponding to the time domain interval; and determining, based on the global feature, an action integrity score of the video segment corresponding to the time domain interval. 2 . The method according to claim 1 , wherein the at least one adjacent segment comprises: at least one of a first adjacent segment in the video with a time sequence located in front of the time domain interval, or a second adjacent segment in the video with a time sequence located behind the time domain interval; and the first adjacent segment and the second adjacent segment respectively comprise at least one video segment. 3 . The method according to claim 1 , wherein the obtaining a time domain interval in a video with an action instance and at least one adjacent segment in the time domain interval comprises: performing actionness estimation separately on at least one video segment in the video, to obtain a time sequence actionness sequence; performing action position prediction based on the time sequence actionness sequence, to obtain the time domain interval in the video with an action instance, the time domain interval comprising a start time and an end time; and extracting, from the video, at least one of the first adjacent segment before the time domain interval or the second adjacent segment after the time domain interval. 4 . The method according to claim 3 , wherein the performing actionness estimation separately on at least one video segment in the video, to obtain a time sequence actionness sequence comprises: for any video segment in the video separately: extracting an image frame as an original image, and performing actionness estimation on the original image, to obtain a first actionness value; extracting a light stream of the any video segment, merging obtained light stream field pictures, to obtain a spliced light scream field image, and performing actionness estimation on the spliced light scream field image, to obtain a second actionness value; obtaining an actionness value of the any video segment from the first actionness value and the second actionness value; and forming the time sequence actionness sequence by the actionness values of all video segments based on a time sequence relation. 5 . The method according to claim 4 , wherein after the obtaining the actionness value of any video segment, the method further comprises: normalizing the actionness value of the any video segment, to obtain a normalized actionness value; and the time sequence actionness sequence comprising: a time sequence actionness sequence formed by the normalized actionness value. 6 . The method according to claim 1 , the method further comprising: obtaining, based on the action feature of the video segment corresponding to the time domain interval, a category score of at least one action category of the video segment corresponding to the time domain interval; and determining, according to the category score of the at least one action category of the video segment corresponding to the time domain interval, a detected action category of the video segment corresponding to the time domain interval. 7 . The method according to claim 6 , the method further comprising: outputting the time domain interval and the detected action category of the video segment corresponding to the time domain interval. 8 . The method according to claim 6 , wherein the obtaining, based on an action feature of the video segment corresponding to the time domain interval, a category score of at least one action category of the video segment corresponding to the time domain interval comprises: separately obtaining, based on the action feature of the at least one action category of the video segment corresponding to the time domain interval, a score of the at least one video segment corresponding to the time domain interval separately belonging to the at least action category; and summing scores of the at least one video segment corresponding to the time domain interval separately belonging to the same action category, to obtain the category score of the at least one action category of the video segment corresponding to the time domain interval. 9 . The method according to claim 1 , wherein the pooling the action features of the at least two video segments in the candidate segments comprises: performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments. 10 . The method according to claim 9 , wherein after the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: merging pooling features obtained after the time domain pyramid-typed pooling. 11 . The method according to claim 10 , wherein before the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: presetting a value of a number K of pooling layers to be 1; the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments comprising: for any first to-be-pooled segment with a value of a preset partition part number B K to be 1, obtaining the pooling feature of the any first to-be-pooled segment from the action feature of the at least one video segment in the any first to-be-pooled segment; for any second to-be-pooled segment with the value of the preset partition part number B K to be greater than 1, segmenting all video segments in the any second to-be-pooled segment into B K parts, obtaining the pooling feature of a corresponding part separately from the action features of each part of the video segments in the B K parts, and merging the pooling features of the B K parts, to obtain the pooling feature of the any second to-be-pooled segment; and the first to-be-pooled segment comprising the video segment corresponding to the time domain interval, any one or more of the first adjacent segment and the second adjacent segment; the second to-be-pooled segment comprising other to-be-pooled segments in the candidate segments except the first to-be-pooled segment. 12 . The method according to claim 10 , wherein before the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments, the method further comprises: presetting a value of a number K of pooling layers to be greater than 1; the performing time domain pyramid-typed pooling processing on the action features of the at least two video segments in the candidate segments comprising: separately for a k th pooling layer: for any first to-be-pooled segment with a value of a preset partition part number B K to be 1, obtaining the pooling feature of the any first to-be-pooled segment at the k th layer from the action feature of the at least one video segment in the any first to-be-pooled segment; for any second to-be-pooled seg
using neural networks · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Activation functions · CPC title
Combinations of networks · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.