Who is the assignee on this patent?

Beijing Youzhuju Network Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V20/46. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods, systems, devices, media and products for video processing

US12347198B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12347198-B2
Application number	US-202418732477-A
Country	US
Kind code	B2
Filing date	Jun 3, 2024
Priority date	Dec 14, 2021
Publication date	Jul 1, 2025
Grant date	Jul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to embodiments of the disclosure, a method, system, device, medium and product for video processing are provided. The method includes extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame; determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for video processing, comprising: extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: obtaining an initial video-level feature of the video instance; at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps; at each subsequent processing layer after the first processing layer amongst the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the subsequent processing layer based on a plurality of intermediate frame-level features generated in a previous processing layer and the plurality of feature maps; and determining a plurality of intermediate frame-level features generated at a last processing layer of the plurality of processing layers as the plurality of frame-level features; determining a video-level feature of the video instance by aggregating the plurality of frame-level features, the video-level feature representing feature information of the video instance across the plurality of frames; and determining an analysis result for the video instance in the plurality of frames based at least on the video-level feature. 2. The method of claim 1 , wherein determining a plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: determining, using a spatial attention mechanism, a plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps. 3. The method of claim 1 , wherein determining the video-level feature of the video instance by aggregating the plurality of frame-level features comprises: determining a plurality of weights for the plurality of frame-level features; and determining the video-level feature of the video instance by weighting the plurality of frame-level features with the plurality of weights. 4. The method of claim 3 , wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the video-level feature of the video instance by weighting the plurality of frame-level features with the plurality of weights comprises: for each of the plurality of processing layers except for a last processing layer, obtaining a plurality of intermediate frame-level features of the video instance generated at the processing layer; determining a plurality of weights for the plurality of intermediate frame-level features; determining an intermediate video-level feature at the processing layer by weighting the plurality of intermediate frame-level features with the determined weights; and generating the video-level feature of the video instance based on intermediate video-level features determined for each of the processing layers and the intermediate video-level feature obtained by weighting the plurality of frame-level features. 5. The method of claim 3 , wherein determining the plurality of weights for the plurality of frame-level features comprises: generating the plurality of weights for the plurality of frame-level features by applying the plurality of frame-level features to a trained weight calculation layer, respectively, wherein the weight calculation layer is trained jointly with a video processing model, and the video processing model is configured to implement the extraction of the plurality of feature maps, the determination of the plurality of frame-level features, the determination of the video-level feature, and the determination of the analysis result. 6. The method of claim 1 , wherein determining the analysis result for the video instance in the plurality of frames based at least on the video-level feature comprises: determining an instance segmentation result in the plurality of frames based at least on the video-level feature of the video instance and the plurality of feature maps, the instance segmentation result indicating a pixel part of each of the plurality of frames that presents the video instance. 7. The method of claim 1 , wherein determining the analysis result for the video instance in the plurality of frames based at least on the video-level feature comprises: determining an instance classification result of the video instance based on the video-level feature of the video instance, the classification result indicating a probability that the video instance belongs to a predetermined category among a plurality of predetermined categories. 8. The method of claim 6 , wherein determining the instance segmentation result in each frame of the plurality of frames further comprises: determining an instance segmentation result in each frame of the plurality of frames further based on the plurality of frame-level features of the video instance in the plurality of frames, respectively. 9. The method of claim 1 , further comprising: based on a frame-level feature of the video instance in at least one frame of the plurality of frames, determining boundary box information of the video instance in the at least one frame respectively, the boundary box information indicating a boundary box coordinate of the video instance in the at least one frame. 10. The method of claim 1 , wherein a predetermined number of video instances is determined for the video, and for each video instance of the predetermined number of video instances, a determination of a plurality of frame-level features of the video instance in the plurality of frames, a determination of video-level feature of the video instance, and an analysis result for the video instance in the plurality of frames are performed. 11. A system for video processing, comprising: at least one processing unit; and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit causes the system to perform acts of: extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level feature in each of the frames representing feature information of the video instance in the frame, wherein the plurality of frame-level features is determined by a plurality of connected processing layers iteratively, and wherein determining the plurality of frame-level features of the video instance in the plurality of frames based on the plurality of feature maps respectively comprises: obtaining an initial video-level feature of the video instance; at a first processing layer of the plurality of processing layers, generating a plurality of intermediate frame-level features of the video instance at the first processing layer based on the initial video-level feature and the plurality of feature maps; at each su

Assignees

Beijing Youzhuju Network Tech Co Ltd

Inventors

Classifications

G06V10/764
using classification, e.g. of video objects · CPC title
G06V10/7715
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
G06V20/49
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
G06V20/70
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06V10/26
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title

Patent family

Related publications grouped by family.

View patent family 80486591

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12347198B2 cover?: According to embodiments of the disclosure, a method, system, device, medium and product for video processing are provided. The method includes extracting a plurality of feature maps from a plurality of frames of a video respectively; determining a plurality of frame-level features of a video instance in the plurality of frames based on the plurality of feature maps respectively, a frame-level …
Who is the assignee on this patent?: Beijing Youzhuju Network Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V20/46. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).