Metadata generation for video indexing

US11755643B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11755643-B2
Application numberUS-202016921248-A
CountryUS
Kind codeB2
Filing dateJul 6, 2020
Priority dateJul 6, 2020
Publication dateSep 12, 2023
Grant dateSep 12, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video indexing system identifies groups of frames within a video frame sequence captured by a static camera during a same scene. Context metadata is generated for each frame in each group based on an analysis of fewer than all frames in the group. The frames are indexed in a database in association with the generated context metadata.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: analyzing a sequence of video frames to determine whether the sequence was captured by a static camera; responsive to determining that the sequence of frames was captured by the static camera, subjecting the sequence of frames to a first series of processing operations that includes generating context metadata for each frame in the sequence based on an analysis of fewer than all frames in the sequence; and responsive to the determining that a select frame of the sequence of frames was captured by a moving camera, subjecting the select frame to a second series of processing operations for generating the context metadata that is different than the first series of processing operations. 2. The method of claim 1 , wherein the second series of processing operations generates context metadata for the select frame based on an image analysis that is limited to the select frame. 3. The method of claim 1 , wherein generating context metadata for each frame in the sequence based on the analysis of fewer than all frames in the sequence further comprises: selecting a keyframe from the sequence; generating at least a portion of the context metadata based on an analysis of the keyframe without analyzing other frames of the sequence; and indexing the other frames of the sequence in association with the generated context metadata. 4. The method of claim 3 , wherein generating the context metadata further comprises: generating descriptors for multiple objects present in the keyframe. 5. The method of claim 4 , wherein generating the context metadata further comprises: generating a scene label generated based on the descriptors generated for the keyframe. 6. The method of claim 3 , wherein generating the context metadata further comprises: generating a region of interest (ROI) mask for the keyframe; applying the ROI mask to each of the other frames in the sequence; and omitting an area defined by the ROI mask from subsequent processing operations performed on each of the other frames of the sequence. 7. The method of claim 1 , further comprising: responsive to determining that the sequence of frames was captured by a static camera, determining a size of an object present in multiple frames of the sequence by analyzing a position of the object relative to a fixed reference point that appears within each of the multiple frames. 8. The method of claim 1 , further comprising: responsive to determining that the sequence of frames was captured by the static camera, executing object tracking logic that assumes a fixed camera frame of reference. 9. A video indexing system comprising: a frame classifier that classifies video frames received as part of a sequence into different static camera scene groups; a context metadata generation engine that: receives a group of frames classified as comprising a same static camera scene group of the different static camera scene groups; and analyzes fewer than all frames in the group to generate context metadata for each frame in the group; and an indexing engine that indexes each frame in the group in a database in association with the generated context metadata. 10. The video indexing system of claim 9 , wherein the context metadata generation engine is further adapted to: determine that a select frame of the sequence was captured by a moving camera, based on a classification of the frame classifier; and responsive to the determination, generate context metadata for the select frame based on an image analysis limited to the select frame. 11. The video indexing system of claim 9 , wherein the context metadata generation engine is further adapted to: select a keyframe from the group; and generate at least a portion of the context metadata for each frame in the group based on an analysis of the keyframe without analyzing other frames of the group. 12. The video indexing system of claim 11 , wherein the context metadata generation engine is adapted to: generate a region of interest (ROI) mask for the keyframe of the group; apply the ROI mask to each frame in the group; and omit an area defined by the ROI mask from subsequent processing operations performed on the frames of the group. 13. The video indexing system of claim 11 , wherein the context metadata includes descriptors for multiple objects present in the keyframe. 14. The video indexing system of claim 13 , wherein the context metadata includes a scene label generated based on the descriptors generated for the keyframe. 15. The video indexing system of claim 9 , further comprising: an object tracker that executes logic to track movements of objects throughout multiple frames of the group, the logic being adapted to implement constraints that assume a fixed frame of reference for a camera capturing the video. 16. One or more tangible computer-readable storage media encoding computer-executable instructions for executing a computer processes comprising: classifying video frames within a sequence into different static camera scene groups; generating context metadata for each frame in a group of the different static camera scene groups, the context metadata being generated based on an analysis of fewer than all frames in the group; and indexing each frame in the group in a database in association with the generated context metadata. 17. The one or more tangible computer-readable storage media of claim 16 , wherein generating the context metadata further comprises: selecting a keyframe from the group; and generating at least a portion of the context metadata for each frame in the group based on an analysis of the keyframe without analyzing other frames of the group. 18. The one or more tangible computer-readable storage media of claim 16 , wherein the computer process further comprises: determining that a select frame of the sequence was captured by a moving camera; responsive to the determination, generating context metadata for the select frame based on an image analysis limited to the select frame. 19. The one or more tangible computer-readable storage media of claim 17 , wherein the computer process further comprises: generating a region of interest (ROI) mask for the keyframe of the group; applying the ROI mask to each frame in the group; and omitting an area defined by the ROI mask from subsequent processing operations performed on the frames of the group.

Assignees

Inventors

Classifications

  • G11B27/28Primary

    by using information signals recorded by the same method as the main recording {(G11B27/22 takes precedence)} · CPC title

  • G06F16/71Primary

    Indexing; Data structures therefor; Storage structures · CPC title

  • using metadata automatically derived from the content · CPC title

  • Detecting features for summarising video content · CPC title

  • using objects detected or recognised in the video content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11755643B2 cover?
A video indexing system identifies groups of frames within a video frame sequence captured by a static camera during a same scene. Context metadata is generated for each frame in each group based on an analysis of fewer than all frames in the group. The frames are indexed in a database in association with the generated context metadata.
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G11B27/28. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 12 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).