Computerized prominent character recognition in videos

US9934423B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9934423-B2
Application numberUS-201414445518-A
CountryUS
Kind codeB2
Filing dateJul 29, 2014
Priority dateJul 29, 2014
Publication dateApr 3, 2018
Grant dateApr 3, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for identifying prominent subjects in video content based on feature point extraction are described herein. Video files may be processed to detect faces on video frames and extract feature points from the video frames. Some video frames may include detected faces and extracted feature points and other video frames may not include detected faces. Based on the extracted feature points, faces may be inferred on video frames where no face was detected. The inferring may be based on feature points. Additionally, video frames may be arranged into groups and two or more groups may be merged. The merging may be based on some groups including video frames having overlapping feature points. The resulting groups each may identify a subject. A frequency representing a number of video frames where the subject appears may be determined for calculating a prominence score for each of the identified subjects in the video file.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising using one or more computing devices to implement: extracting a first feature point from a first video frame of a video file and a plurality of second feature points from a second video frame of the video file; detecting a portion of a body that identifies a subject in the first video frame, the portion of the body being different than the first feature point; associating the first feature point with the portion of the body in the first video frame; determining whether a spatial distance between a first location of the first feature point within the first video frame and a second location of each of the plurality of second feature points within the second video frame is less than a distance threshold, the distance threshold corresponding to a maximum distance that the portion of the body is estimated to move between the first video frame and the second video frame; inferring the portion of the body in the second video frame based at least in part on a determination that the spatial distance between the first location of the first feature point in the first video frame and the second location of a first one of the plurality of second feature points in the second video frame is less than the distance threshold; associating the first one of the plurality of second feature points with the portion of the body in the second video frame based at least in part on inferring the portion of the body in the second video frame; and arranging the first video frame and the second video frame into a group. 2. The method of claim 1 further comprising, prior to arranging the first video frame and the second video frame into the group: arranging the first video frame into a first group of video frames with at least a third video frame of the video file based at least in part on first similarity data between the first video frame and the third video frame; and arranging the second video frame into a second group of video frames with at least a fourth video frame of the video file based at least in part on second similarity data between the second video frame and the fourth video frame. 3. The method of claim 2 further comprising, prior to arranging the first video frame and the second video frame into the group: comparing first feature points associated with each individual video frame in the first group with second feature points associated with each individual video frame in the second group; and determining that at least one feature point of the first feature points and at least one feature point of the second feature points substantially overlap. 4. The method of claim 1 , wherein the group is associated with the subject. 5. The method of claim 4 further comprising, determining a frequency associated with the subject based at least in part on counting a number of video frames including the subject and dividing the number of video frames including the subject by a total number of video frames in the video file. 6. The method of claim 5 , wherein the portion of the body is associated with a set of face detail values including at least a size value and a position value associated with the portion of the body. 7. The method of claim 6 further comprising, calculating a prominence score associated with the subject based at least in part on at least one of the size value, the position value, or the frequency associated with the subject. 8. The method of claim 1 , further comprising: determining the portion of the body is not detected in the second video frame; wherein the determining whether the spatial distance between the first location of the first feature point in the first video frame and the second location of each of the plurality of second feature points in the second video frame is less than the distance threshold is in response to the portion of the body not being detected; and wherein the inferring the face in the second video frame is in response to the portion of the body not being detected. 9. The method of claim 1 , further comprising inferring the portion of the body is not located in the second video frame based at least in part on a determination that the spatial distance between the first location of the first feature point in the first video frame and the second location of each of the plurality of second feature points in the second video frame is not less than the distance threshold. 10. The method of claim 1 , further comprising: inferring that at least a second one of the plurality of second feature points in the second video frame is not associated with the portion of the body in the second video frame based at least in part on a determination that the spatial distance between the first location of the first feature point in the first video frame and the second location of at least the second one of the plurality of second feature points in the second video frame is not less than the distance threshold. 11. A system comprising: memory; one or more processors operably coupled to the memory; and one or more modules stored in the memory and executable by the one or more processors, the one or more modules including: a face detection module configured to detect a face associated with a subject in a first video frame in a video file; a feature detection module configured to: extract a first feature point from the first video frame and a plurality of second feature points from a second video frame in the video file, the first feature point and the plurality of second feature points being different than the face; and associate the first feature point with the face in the first video frame; determine whether a spatial distance between a first location of the first feature point within the first video frame and a second location of each of the plurality of second feature points within the second video frame is less than a distance threshold, the distance threshold corresponding to a maximum distance that the face is estimated to move between the first video frame and the second video frame; infer the face in the second video frame based at least in part on a determination that the spatial distance between the first location of the first feature point in the first video frame and the second location of a first one of the plurality of second feature points in the second video frame is less than the distance threshold; and associate the first one of the plurality of second feature points with the face in the second video frame based at least in part on inferring the face in the second video frame; and a grouping module configured to arrange the first video frame and the second video frame into a group based at least in part on a relationship between the first feature point and the first one of the plurality of second feature points, wherein the group associates to the subject. 12. The system of claim 11 , further comprising a scoring module configured to determine a prominence score associated with the subject. 13. The system of claim 12 , further comprising a post processing module configured to perform post processing operations including at least one of filtering the video file and one or more other video files based at least in part on the prominence score or ranking the video file and the one or more other video files based at least in part on the prominence score. 14. The system of claim 11 , wherein the first video frame precedes the second video frame by one or more video frames. 15. The system of claim 11 , wherein the first video frame succeeds the second video frame by one or more video frames. 16. The system of claim 11

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9934423B2 cover?
Techniques for identifying prominent subjects in video content based on feature point extraction are described herein. Video files may be processed to detect faces on video frames and extract feature points from the video frames. Some video frames may include detected faces and extracted feature points and other video frames may not include detected faces. Based on the extracted feature points,…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06K9/00281. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 03 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).