Semi supervised animated character recognition in video

US11270121B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11270121-B2
Application numberUS-202016831353-A
CountryUS
Kind codeB2
Filing dateMar 26, 2020
Priority dateAug 20, 2019
Publication dateMar 8, 2022
Grant dateMar 8, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated media file such that each group contains images associated with a single character. The character groups are then labeled and used to train an image classification model. Once trained, the image classification model can be applied to subsequent multi-frame animated media files to automatically classifying the animated characters included therein.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer readable storage media having a media indexer service stored thereon, the media indexer service comprising: a character recognition engine including program instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: identify keyframes of a multi-frame animated media file; detect character region proposals within the keyframes, wherein each character region proposal comprises a bounding box or subset of a keyframe containing a proposed animated character; determine a similarity between the character region proposals by embedding features of the character region proposals into a feature space; and automatically group the character region proposals into animated character groups based on the similarity, wherein each animated character group is associated with a single animated character of the multi-frame animated media file. 2. The one or more non-transitory computer readable storage media of claim 1 , wherein to detect the character region proposals within the keyframes, the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: access a pre-trained object detection model; and process the keyframes using the pre-trained object detection model to identify the character region proposals. 3. The one or more non-transitory computer readable storage media of claim 1 , wherein to determine the similarity between the character region proposals, the program instructions, when executed by the one or more processing systems of the computing apparatus, direct the computing apparatus to: for each character region proposal of the character region proposals, extract features of the character region proposal; embed the features of the character region proposal into a feature space; and determine the similarity between the character region proposals by comparing the embedded features within the feature space. 4. The one or more non-transitory computer readable storage media of claim 3 , wherein to automatically group the character region proposals into the animated character groups based on the similarity, the program instructions, when executed by the one or more processing systems of the computing apparatus, direct the computing apparatus to: apply a clustering algorithm to identify the animated character groups based on the determined similarity. 5. The one or more non-transitory computer readable storage media of claim 1 , wherein the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: identify label information associated with at least one of the animated character groups; and classify the at least one of the animated character groups with the identified label information resulting in at least one annotated animated character group. 6. The one or more non-transitory computer readable storage media of claim 5 , wherein to identify the label information associated with the animated character groups, the program instructions, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: present the at least one of the animated character groups to a user in a user interface; and receive, via the user interface, the label information associated with at least one of the animated character groups. 7. The one or more non-transitory computer readable storage media of claim 1 , wherein the media indexer service further comprises: an indexing engine including program instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: collect annotated animated character groups; store the annotated animated character groups in a media indexer database; and feed the annotated animated character groups to an image classifier to train an image classification model. 8. The one or more non-transitory computer readable storage media of claim 1 , wherein the media indexer service further comprises: an indexing engine including additional instructions that, when executed by one or more processing systems of a computing apparatus, direct the computing apparatus to: determine that a trained image classification model has been specified; automatically recognize, using the trained image classification model, label information associated with at least one of the animated character groups; and classify the at least one of the animated character groups with the recognized label information resulting in at least one annotated animated character group. 9. The one or more non-transitory computer readable storage media of claim 8 , wherein the additional instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: index the multi-frame animated media file using the at least one annotated animated character group. 10. The one or more non-transitory computer readable storage media of claim 8 , wherein additional instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: perform a smoothing operation to consolidate two or more of the annotated animated character groups into a single annotated animated character group. 11. The one or more non-transitory computer readable storage media of claim 1 , wherein the program instructions, when executed by the one or more processing systems of the computing apparatus, further direct the computing apparatus to: automatically detect an animation style of the multi-frame animated media file; and prior to detecting the character region proposals within the keyframes, transforming the keyframes based on the detected animation style. 12. A computer-implemented method for training an image classification model to automatically classifying animated characters in a multi-frame animated media file, the method comprising: detecting character region proposals within keyframes of a multi-frame animated media file, wherein each character region proposal comprises a bounding box or subset of a keyframe containing a proposed animated character; embedding features of the character region proposals into a feature space to determine a similarity between the character region proposals; automatically grouping the character region proposals into animated character groups based on the similarity, wherein each character group is associated with a single animated character of the multi-frame animated media file; classifying at least one of the animated character groups with label information resulting in at least one annotated animated character group; and training an image classification model to automatically classify animated characters in subsequent multi-frame animated media files by feeding the at least one annotated animated character group to an image classifier. 13. The computer-implemented method of claim 12 , the method further comprising: indexing the multi-frame animated media file using the at least one of the annotated animated character groups. 14. The computer-implemented method of claim 12 , wherein determining the similarity between the character region proposals includes, for each character region proposal of the character region proposals: extracting features of the character region proposal; embedding the features of the character region proposal into a feature space; and determining the similarity between the char

Assignees

Inventors

Classifications

  • Detecting features for summarising video content · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Rule-based classification · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06V20/41Primary

    Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11270121B2 cover?
The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated medi…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/41. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).