Methods, systems, devices, media and products for video processing

US12347198B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12347198-B2
Application numberUS-202418732477-A
CountryUS
Kind codeB2
Filing dateJun 3, 2024
Priority dateDec 14, 2021
Publication dateJul 1, 2025
Grant dateJul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to embodiments of the disclosure, a method, system, device, medium and product for video processing are provided. The method includes extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame; determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for video processing, comprising: extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: obtaining an initial video-level feature of the video instance; at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps; at each subsequent processing layer after the first processing layer amongst the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the subsequent processing layer based on a plurality of intermediate frame-level features generated in a previous processing layer and the plurality of feature maps; and determining a plurality of intermediate frame-level features generated at a last processing layer of the plurality of processing layers as the plurality of frame-level features; determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature. 2. The method of claim 1 , wherein determining a plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: determining, using a spatial attention mechanism, a plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps. 3. The method of claim 1 , wherein determining the video-level feature of the video instance by aggregating the plurality of frame-level features comprises: determining a plurality of weights for the plurality of frame-level features; and determining the video-level feature of the video instance by weighting the plurality of frame-level features with the plurality of weights. 4. The method of claim 3 , wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the video-level feature of the video instance by weighting the plurality of frame-level features with the plurality of weights comprises: for each of the plurality of processing layers except for a last processing layer, obtaining a plurality of intermediate frame-level features of the video instance generated at the processing layer; determining a plurality of weights for the plurality of intermediate frame-level features; determining an intermediate video-level feature at the processing layer by weighting the plurality of intermediate frame-level features with the determined weights; and generating the video-level feature of the video instance based on intermediate video-level features determined for each of the processing layers and the intermediate video-level feature obtained by weighting the plurality of frame-level features. 5. The method of claim 3 , wherein determining the plurality of weights for the plurality of frame-level features comprises: generating the plurality of weights for the plurality of frame-level features by applying the plurality of frame-level features to a trained weight calculation layer, respectively, wherein the weight calculation layer is trained jointly with a video processing model, and the video processing model is configured to implement the extraction of the plurality of feature maps, the determination of the plurality of frame-level features, the determination of the video-level feature, and the determination of the analysis result. 6. The method of claim 1 , wherein determining the analysis result for the video instance in the plurality of frames based at least on the video-level feature comprises: determining an instance segmentation result in the plurality of frames based at least on the video-level feature of the video instance and the plurality of feature maps, the instance segmentation result indicating a pixel part of each of the plurality of frames that presents the video instance. 7. The method of claim 1 , wherein determining the analysis result for the video instance in the plurality of frames based at least on the video-level feature comprises: determining an instance classification result of the video instance based on the video-level feature of the video instance, the classification result indicating a probability that the video instance belongs to a predetermined category among a plurality of predetermined categories. 8. The method of claim 6 , wherein determining the instance segmentation result in each frame of the plurality of frames further comprises: determining an instance segmentation result in each frame of the plurality of frames further based on the plurality of frame-level features of the video instance in the plurality of frames, respectively. 9. The method of claim 1 , further comprising: based on a frame-level feature of the video instance in at least one frame of the plurality of frames, determining boundary box information of the video instance in the at least one frame respectively, the boundary box information indicating a boundary box coordinate of the video instance in the at least one frame. 10. The method of claim 1 , wherein a predetermined number of video instances is determined for the video, and for each video instance of the predetermined number of video instances, a determination of a plurality of frame-level features of the video instance in the plurality of frames, a determination of video-level feature of the video instance, and an analysis result for the video instance in the plurality of frames are performed. 11. A system for video processing, comprising: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit causes the system to perform acts of: extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: obtaining an initial video-level feature of the video instance; at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps; at each su

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12347198B2 cover?
According to embodiments of the disclosure, a method, system, device, medium and product for video processing are provided. The method includes extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level …
Who is the assignee on this patent?
Beijing Youzhuju Network Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).