Video cover determining method and device, and storage medium

US12266178B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12266178-B2
Application numberUS-202117334971-A
CountryUS
Kind codeB2
Filing dateMay 31, 2021
Priority dateNov 6, 2020
Publication dateApr 1, 2025
Grant dateApr 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, devices, and a non-transitory computer-readable storage medium are provided for determining a video cover image. The method includes: obtaining a candidate image set containing a plurality of image frames to be processed by determining the plurality of image frames to be processed from a video to be processed, each of the plurality of image frames to be processed containing at least one target object; obtaining a target score of each of the plurality of image frames to be processed by inputting the plurality of image frames to be processed in the candidate image set into a scoring network; and sorting the target scores of the plurality of image frames to be processed according to a set order to obtain a sorting result, and determining a video cover image of the video to be processed from the plurality of image frames to be processed according to the sorting result.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for determining a video cover image, comprising: obtaining a candidate image set containing a plurality of image frames to be processed by determining the plurality of image frames to be processed from a video to be processed, each of the plurality of image frames to be processed containing at least one target object; inputting each of the plurality of image frames to be processed into an image scoring network to obtain a black edge size of each of the plurality of image frames to be processed, a brightness score of each of the plurality of image frames to be processed, and a definition score of each of the plurality of image frames to be processed, wherein the image scoring network is obtained based on neural network training; weighting the black edge size of each of the plurality of image frames to be processed, the brightness score of each of the plurality of image frames to be processed, and the definition score of each of the plurality of image frames to be processed; summing the weighted black edge size of a picture where each of the plurality of image frames to be processed is located, the weighted brightness score of each of the plurality of image frames to be processed, and the weighted definition score of each of the plurality of image frames to be processed to obtain an image feature score of each of the plurality of image frames to be processed, wherein image features of an image frame to be processed comprise: a black edge, a brightness, and a definition, and the black edge is a black part except picture content in the image frame to be processed; inputting each of the plurality of image frames to be processed into an object scoring network to obtain a number of person images in each of the plurality of image frames to be processed, a location of a person image in the image frame to be processed, a size of the person image, a definition score of the person image, an eye state score of a person in the person image, an expression score of the person in the person image, and a pose score of the person in the person image, wherein the object scoring network is obtained based on neural network training; obtaining an object feature score of each of the plurality of image frames to be processed based on the number of the person images in each of the plurality of image frames to be processed, the location of the person image in the image frame to be processed, the size of the person image, the definition score of the person image, the eye state score of the person in the person image, the expression score of the person in the person image, and the pose score of the person in the person image; inputting each of the plurality of image frames to be processed into an aesthetic scoring network to obtain a composition score of each of the image frames to be processed and a color richness score of each of the plurality of image frames to be processed, wherein the aesthetic scoring network is obtained based on neural network training; obtaining an aesthetic feature score of each of the plurality of image frames to be processed based on the composition score of each of the plurality of image frames to be processed and the color richness score of each of the plurality of image frames to be processed; obtaining the target score of each of the plurality of image frames to be processed based on the image feature score, the object feature score, and the aesthetic feature score; and sorting a plurality of target scores of the plurality of image frames to be processed according to a set order to obtain a sorting result, and determining the video cover image of the video to be processed from the plurality of image frames to be processed according to the sorting result. 2. The method of claim 1 , wherein obtaining the target score of each of the plurality of image frames to be processed based on the image feature score, the object feature score, and the aesthetic feature score comprises: obtaining a weighted image feature score, a weighted object feature score, and a weighted aesthetic feature score by respectively weighting the image feature score, the object feature score, and the aesthetic feature score of each of the plurality of image frames to be processed; and obtaining the target score of each of the plurality of image frames to be processed by summing the weighted image feature score, the weighted object feature score, and the weighted aesthetic feature score. 3. The method of claim 1 , wherein a number of the plurality of image frames is M, M being a positive integer; and wherein the method further comprises: obtaining N image frames by performing frame extraction on the video to be processed according to a set time interval, wherein determining the plurality of image frames to be processed from the video to be processed comprises: determining the plurality of image frames to be processed containing the target object from the N image frames, N being a positive integer greater than or equal to M. 4. The method of claim 1 , further comprising: extracting one or more image frames from the plurality of image frames to be processed by performing matching based on a filtering rule according to a filtering model, wherein the one or more image frames extracted are not matched with image frames to be filtered contained in the filtering rule, and wherein obtaining the target score of each of the plurality of image frames to be processed by inputting the plurality of image frames to be processed in the candidate image set into the scoring network comprises: obtaining the target score of each of the one or more image frames that are not matched with the image frames to be filtered contained in the filtering rule and extracted from the plurality of image frames to be processed by inputting the image frame into the scoring network. 5. A device, comprising: a processor; and a memory configured to store instructions executable by the processor, wherein the processor is configured to execute the instructions to: obtain a candidate image set containing a plurality of image frames to be processed by determining the plurality of image frames to be processed from a video to be processed, each of the plurality of image frames to be processed containing at least one target object; input each of the plurality of image frames to be processed into an image scoring network to obtain a black edge size of each of the plurality of image frames to be processed, a brightness score of each of the plurality of image frames to be processed, and a definition score of each of the plurality of image frames to be processed, wherein the image scoring network is obtained based on neural network training; weight the black edge size of each of the plurality of image frames to be processed, the brightness score of each of the plurality of image frames to be processed, and the definition score of each of the plurality of image frames to be processed; sum the weighted black edge size of a picture where each of the plurality of image frames to be processed is located, the weighted brightness score of each of the plurality of image frames to be processed, and the weighted definition score of each of the plurality of image frames to be processed to obtain an image feature score of each of the plurality of image frames to be processed, wherein image features of an image frame to be processed comprise: a black edge, a brightness, and a definition, and the black edge is a black part except picture content in the image frame to be processed; input each of the plurality of image frames to be processed into an object scoring network to obtain a number of person images in each of the plurality of image frames to be processed, a location of a person image in the image frame to be processed, a size of the person image, a defi

Assignees

Inventors

Classifications

  • Matching criteria, e.g. proximity measures · CPC title

  • in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames · CPC title

  • Blocking scenes or portions of the received content, e.g. censoring scenes · CPC title

  • involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream (arrangements characterised by components specially adapted for monitoring, identification or recognition of video in broadcast systems H04H60/59) · CPC title

  • involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12266178B2 cover?
Methods, devices, and a non-transitory computer-readable storage medium are provided for determining a video cover image. The method includes: obtaining a candidate image set containing a plurality of image frames to be processed by determining the plurality of image frames to be processed from a video to be processed, each of the plurality of image frames to be processed containing at least on…
Who is the assignee on this patent?
Beijing Xiaomi Mobile Software Co Ltd
What technology area does this patent fall under?
Primary CPC classification H04N21/8549. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).