Methods and systems for indexing multimedia content

US9785834B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9785834-B2
Application numberUS-201514798499-A
CountryUS
Kind codeB2
Filing dateJul 14, 2015
Priority dateJul 14, 2015
Publication dateOct 10, 2017
Grant dateOct 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to embodiments illustrated herein, a method and system is provided for indexing a multimedia content. The method includes extracting, by one or more processors, a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object. Thereafter, a body language information pertaining to the human object is determined from the set of frames by utilizing one or more image processing techniques. Further, an interaction information is determined from the set of frames. The interaction information is indicative of an action performed by the human object on the inanimate object. Thereafter, the multimedia content is indexed in a content database based at least on the body language information and the interaction information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for indexing multimedia content within an educational environmental, the method comprising: extracting, by one or more processors, a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object; determining, by the one or more processors, a body language information pertaining to the human object from the set of frames by utilizing one or more image processing techniques; determining, by the one or more processors, interaction information from the set of frames, wherein the interaction information is indicative of an action performed by the human object on the inanimate object; and indexing, by the one or more processors, the multimedia content in a content database based at least on the body language information and the interaction information, wherein the indexing of the multimedia content is further based on emotion information, audio characteristics information, discourse rate, one or more concepts in the multimedia content, number of repetitions of the one or more concepts, and personality type of the human object. 2. The method of claim 1 , further comprising: analyzing, by the one or more processors, the multimedia content to determine the emotion information by utilizing one or more of one or more image processing techniques, one or more speech/audio processing techniques, or one or more natural language processing techniques, wherein the emotion information is indicative of an emotion presented by the human object in the multimedia content. 3. The method of claim 1 further comprising analyzing, by the one or more processors, an audio content within the multimedia content to determine the audio characteristics information by utilizing one or more speech/audio processing techniques, wherein the audio characteristics information comprises one or more of a speech rate, an accent, a speaking style, a background audio, or a background noise. 4. The method of claim 1 , further comprising determining, by the one or more processors, a first textual content from the multimedia content by utilizing one or more text recognition techniques, wherein the first textual content comprises one or more of a textual content located on the inanimate object or a close-captioned text within the multimedia content. 5. The method of claim 4 , further comprising: determining, by the one or more processors, a second textual content from an audio content within the multimedia content, by utilizing one or more speech-to-text conversion techniques. 6. The method of claim 5 further comprising determining, by said one or more processors, the discourse rate associated with the multimedia content based on the first textual content and the second textual content. 7. The method of claim 5 , further comprising extracting, by the one or more processors, one or more keywords from the first textual content and the second textual content by utilizing one or more natural language processing techniques, wherein the one or more keywords relate to the one or more concepts explained in the multimedia content. 8. The method of claim 7 , further comprising determining, by the one or more processors, a number of repetitions of the one or more concepts in the multimedia content. 9. The method of claim 8 , wherein the multimedia content is further indexed based the number of repetitions of the one or more concepts in the multimedia content. 10. The method of claim 7 , further comprising determining, by the one or more processors, the personality type associated with the human object based on one or more of the body language information, the interaction information, the emotion information indicative of an emotion presented by the human object in the multimedia content, a speech rate of said human object, a speaking style of said human object, or the second textual content determined from an audio content within the multimedia content. 11. The method of claim 1 , wherein the multimedia content comprises one or more of an educational lecture, a corporate e-learning module (ELM), or a marketing/promotional video. 12. The method of claim 1 , wherein the inanimate object comprises one or more of a presentation slide, a writing board, a poster, a paper, or a prop/model. 13. The method of claim 12 , wherein the action performed by the human object on the inanimate object includes one or more of: the human object writing on the inanimate object, the human object pointing towards or touching the inanimate object, the human object holding the inanimate object, the human object scrolling through a textual content on the inanimate object, or the human object modifying or highlighting the textual content on the inanimate object. 14. The method of claim 1 , wherein the body language information is determined based on one or more of a hand motions of the human object in the multimedia content, a body motion of the human object, a facial expression/emotion of the human object, a proximity of the human object to a video capturing device utilized for creation of the multimedia content, or an eye contact of the human object towards the video capturing device. 15. A system for indexing a multimedia content within an educational environment, the system comprising: one or more processors configured to: extract a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object; determine a body language information pertaining to the human object from the set of frames by utilizing one or more image processing techniques; determine an interaction information from the set of frames, wherein the interaction information is indicative of an action performed by the human object on the inanimate object; and index the multimedia content in a content database based at least on the body language information and the interaction information, wherein the indexing of the multimedia content is further based on emotion information, audio characteristics information, discourse rate, one or more concepts in the multimedia content, number of repetitions of the one or more concepts, and personality type of the human object. 16. The system of claim 15 , wherein the multimedia content includes one or more of an educational lecture, a corporate e-learning module (ELM), or a marketing/promotional video. 17. The system of claim 15 , wherein the inanimate object comprises one or more of a presentation slide, a writing board, a poster, a paper, or a prop/model. 18. The system of claim 17 , wherein the action performed by the human object on the inanimate object includes one or more of: the human object writing on said inanimate object, the human object pointing towards or touching said inanimate object, the human object holding the inanimate object, the human object scrolling through a textual content on the inanimate object, or the human object modifying or highlighting the textual content on the inanimate object. 19. The system of claim 15 , wherein the body language information is determined based on one or more of a hand motions of the human object in the multimedia content, a body motion of the human object, a facial expression/emotion of the human object, a proximity of said human object to a video capturing device utilized for creation of the multimedia content, or an eye contact of said human object towards the video capturing device. 20. A computer program product for use with a computin

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • G06V40/174Primary

    Facial expression recognition · CPC title

  • using objects detected or recognised in the video content · CPC title

  • Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9785834B2 cover?
According to embodiments illustrated herein, a method and system is provided for indexing a multimedia content. The method includes extracting, by one or more processors, a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object. Thereafter, a body language information pertaining to the human object is determined from…
Who is the assignee on this patent?
Xerox Corp, Videoken Inc
What technology area does this patent fall under?
Primary CPC classification G06K9/00577. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).