Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06V20/41. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Semi supervised animated character recognition in video

US11270121B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11270121-B2
Application number	US-202016831353-A
Country	US
Kind code	B2
Filing date	Mar 26, 2020
Priority date	Aug 20, 2019
Publication date	Mar 8, 2022
Grant date	Mar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated media file such that each group contains images associated with a single character. The character groups are then labeled and used to train an image classification model. Once trained, the image classification model can be applied to subsequent multi-frame animated media files to automatically classifying the animated characters included therein.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer readable storage media having a media indexer service stored thereon, the media indexer service comprising: a character recognition engine including program instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: identify keyframes of a multi-frame animated media file; detect character region proposals within the keyframes, wherein each character region proposal comprises a bounding box or subset of a keyframe containing a proposed animated character; determine a similarity between the character region proposals by embedding features of the character region proposals into a feature space; and automatically group the character region proposals into animated character groups based on the similarity, wherein each animated character group is associated with a single animated character of the multi-frame animated media file. 2. The one or more non-transitory computer readable storage media of claim 1 , wherein to detect the character region proposals within the keyframes, the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: access a pre-trained object detection model; and process the keyframes using the pre-trained object detection model to identify the character region proposals. 3. The one or more non-transitory computer readable storage media of claim 1 , wherein to determine the similarity between the character region proposals, the program instructions, when executed by the one or more processing systems of the computing apparatus, direct the computing apparatus to: for each character region proposal of the character region proposals, extract features of the character region proposal; embed the features of the character region proposal into a feature space; and determine the similarity between the character region proposals by comparing the embedded features within the feature space. 4. The one or more non-transitory computer readable storage media of claim 3 , wherein to automatically group the character region proposals into the animated character groups based on the similarity, the program instructions, when executed by the one or more processing systems of the computing apparatus, direct the computing apparatus to: apply a clustering algorithm to identify the animated character groups based on the determined similarity. 5. The one or more non-transitory computer readable storage media of claim 1 , wherein the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: identify label information associated with at least one of the animated character groups; and classify the at least one of the animated character groups with the identified label information resulting in at least one annotated animated character group. 6. The one or more non-transitory computer readable storage media of claim 5 , wherein to identify the label information associated with the animated character groups, the program instructions, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: present the at least one of the animated character groups to a user in a user interface; and receive, via the user interface, the label information associated with at least one of the animated character groups. 7. The one or more non-transitory computer readable storage media of claim 1 , wherein the media indexer service further comprises: an indexing engine including program instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: collect annotated animated character groups; store the annotated animated character groups in a media indexer database; and feed the annotated animated character groups to an image classifier to train an image classification model. 8. The one or more non-transitory computer readable storage media of claim 1 , wherein the media indexer service further comprises: an indexing engine including additional instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: determine that a trained image classification model has been specified; automatically recognize, using the trained image classification model, label information associated with at least one of the animated character groups; and classify the at least one of the animated character groups with the recognized label information resulting in at least one annotated animated character group. 9. The one or more non-transitory computer readable storage media of claim 8 , wherein the additional instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: index the multi-frame animated media file using the at least one annotated animated character group. 10. The one or more non-transitory computer readable storage media of claim 8 , wherein additional instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: perform a smoothing operation to consolidate two or more of the annotated animated character groups into a single annotated animated character group. 11. The one or more non-transitory computer readable storage media of claim 1 , wherein the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: automatically detect an animation style of the multi-frame animated media file; and prior to detecting the character region proposals within the keyframes, transforming the keyframes based on the detected animation style. 12. A computer-implemented method for training an image classification model to automatically classifying animated characters in a multi-frame animated media file, the method comprising: detecting character region proposals within keyframes of a multi-frame animated media file, wherein each character region proposal comprises a bounding box or subset of a keyframe containing a proposed animated character; embedding features of the character region proposals into a feature space to determine a similarity between the character region proposals; automatically grouping the character region proposals into animated character groups based on the similarity, wherein each character group is associated with a single animated character of the multi-frame animated media file; classifying at least one of the animated character groups with label information resulting in at least one annotated animated character group; and training an image classification model to automatically classify animated characters in subsequent multi-frame animated media files by feeding the at least one annotated animated character group to an image classifier. 13. The computer-implemented method of claim 12 , the method further comprising: indexing the multi-frame animated media file using the at least one of the annotated animated character groups. 14. The computer-implemented method of claim 12 , wherein determining the similarity between the character region proposals includes, for each character region proposal of the character region proposals: extracting features of the character region proposal; embedding the features of the character region proposal into a feature space; and determining the similarity between the char

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06V20/47
Detecting features for summarising video content · CPC title
G06V20/70
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06F18/24765
Rule-based classification · CPC title
G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V20/41Primary
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

Patent family

Related publications grouped by family.

View patent family 74646281

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11270121B2 cover?: The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated medi…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Negative sampling algorithm for enhanced image classification

Apparatus, systems, and methods for integrating digital media content

Object detection and tracking delay reduction in video analytics

Training machine learning models

Identifying objects within an image

Systems and methods for reducing a plurality of bounding regions

Method and system for automatic tagging in television using crowd sourcing technique

Frequently asked questions