Computer vision system, computer vision method, computer vision program, and learning method
US-2024320956-A1 · Sep 26, 2024 · US
US2023351752A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023351752-A1 |
| Application number | US-202017768815-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 19, 2020 |
| Priority date | Nov 1, 2019 |
| Publication date | Nov 2, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various implementations of the subject matter relate to moment localization in media stream. In some implementations, a two-dimensional temporal feature map representing a plurality of moments within a media stream is extracted from the media stream, wherein the two-dimensional temporal feature map comprises a first dimension representing a start of a respective one of the plurality of moments and a second dimension representing an end of a respective one of the plurality of moments. A correlation between the plurality of moments and an action in the media stream is determined based on the two-dimensional temporal feature map.
Opening claim text (preview).
1 . A computer-implemented method, comprising: extracting, from a media stream, a two-dimensional temporal feature map representing a plurality of moments within the media stream, wherein the two-dimensional temporal feature map comprises a first dimension representing a start of a respective one of the plurality of moments and a second dimension representing an end of a respective one of the plurality of moments; and determining, based on the two-dimensional temporal feature map, a correlation between the plurality of moments and an action in the media stream. 2 . The method of claim 1 , wherein extracting the two-dimensional temporal feature map comprises: segmenting the media stream into a plurality of clips; extracting features of respective ones of the plurality of clips to obtain a feature map of the media stream; and extracting, from features of one or more clips corresponding to a moment of the plurality of moments in the feature map of the media stream, features of this moment as a part of the two-dimensional temporal feature map. 3 . The method of claim 1 , wherein determining the correlation comprises: sampling the plurality of moments at respective sample rates to determine a plurality of candidate moments, wherein the sample rates are adaptively adjusted based on lengths of respective ones of the plurality of moments; and determining a correlation between the plurality of candidate moments and the action in the media stream. 4 . The method of claim 3 , wherein the sample rates are configured to decrease as the lengths of the respective moments increase. 5 . The method of claim 1 , wherein determining the correlation comprises: applying a convolutional layer to the two-dimensional temporal feature map to obtain a further feature map having a same dimension as the two-dimensional temporal feature map; and determining, based on the further feature map, scores of correlation between the plurality of moments and the action in the media stream. 6 . The method of claim 5 , wherein the convolutional layer comprises a dilated convolution and strides of the dilated convolution are configured to increase as lengths of the respective moments increase. 7 . The method of claim 1 , wherein determining the correlation comprises: in response to receiving a query for a particular action in the media stream, extracting a feature vector of the query; and determining the correlation based on the feature vector of the query and the two-dimensional temporal feature map. 8 . The method of claim 7 , wherein determining the correlation comprises: fusing the feature vector of the query and the two-dimensional temporal feature map to generate a further two-dimensional temporal feature map having a same dimension as the two-dimensional temporal feature map; and determining, based on the further two-dimensional temporal feature map, the correlation between the plurality of moments and the particular action. 9 . The method of claim 8 , wherein fusing the feature vector of the query and the two-dimensional temporal feature map comprises: generating the further two-dimensional temporal feature map by applying a Hadamard product to the feature vector of the query and the two-dimensional temporal feature map. 10 . The method of claim 7 , wherein the query comprises a natural language query. 11 . The method of claim 1 , wherein the media stream comprises an untrimmed media stream. 12 . A device comprising: a processing unit; and a memory coupled to the processing unit and having instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts comprising: extracting, from a media stream, a two-dimensional temporal feature map representing a plurality of moments within the media stream, wherein the two-dimensional temporal feature map comprises a first dimension representing a start of a respective one of the plurality of moments and a second dimension representing an end of a respective one of the plurality of moments; and determining, based on the two-dimensional temporal feature map, a correlation between the plurality of moments and an action in the media stream. 13 . The device of claim 12 , wherein extracting the two-dimensional temporal feature map comprises: segmenting the media stream into a plurality of clips; extracting features of respective ones of the plurality of clips to obtain a feature map of the media stream; and extracting, from features of one or more clips corresponding to a moment of the plurality of moments in the feature map of the media stream, features of this moment as a part of the two-dimensional temporal feature map. 14 . The device of claim 12 , wherein determining the correlation comprises: sampling the plurality of moments at respective sample rates to determine a plurality of candidate moments, wherein the sample rates are adaptively adjusted based on lengths of respective ones of the plurality of moments; and determining a correlation between the plurality of candidate moments and the action in the media stream. 15 . A computer program product stored in a computer storage medium and comprising computer-executable instructions which, when executed by a device, cause the device to perform acts comprising: extracting, from a media stream, a two-dimensional temporal feature map representing a plurality of moments within the media stream, wherein the two-dimensional temporal feature map comprises a first dimension representing a start of a respective one of the plurality of moments and a second dimension representing an end of a respective one of the plurality of moments; and determining, based on the two-dimensional temporal feature map, a correlation between the plurality of moments and an action in the media stream.
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.