Performing Automated Tasks Based on Visual Cues
US-2015379347-A1 · Dec 31, 2015 · US
US9785834B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9785834-B2 |
| Application number | US-201514798499-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2015 |
| Priority date | Jul 14, 2015 |
| Publication date | Oct 10, 2017 |
| Grant date | Oct 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to embodiments illustrated herein, a method and system is provided for indexing a multimedia content. The method includes extracting, by one or more processors, a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object. Thereafter, a body language information pertaining to the human object is determined from the set of frames by utilizing one or more image processing techniques. Further, an interaction information is determined from the set of frames. The interaction information is indicative of an action performed by the human object on the inanimate object. Thereafter, the multimedia content is indexed in a content database based at least on the body language information and the interaction information.
Opening claim text (preview).
What is claimed is: 1. A method for indexing multimedia content within an educational environmental, the method comprising: extracting, by one or more processors, a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object; determining, by the one or more processors, a body language information pertaining to the human object from the set of frames by utilizing one or more image processing techniques; determining, by the one or more processors, interaction information from the set of frames, wherein the interaction information is indicative of an action performed by the human object on the inanimate object; and indexing, by the one or more processors, the multimedia content in a content database based at least on the body language information and the interaction information, wherein the indexing of the multimedia content is further based on emotion information, audio characteristics information, discourse rate, one or more concepts in the multimedia content, number of repetitions of the one or more concepts, and personality type of the human object. 2. The method of claim 1 , further comprising: analyzing, by the one or more processors, the multimedia content to determine the emotion information by utilizing one or more of one or more image processing techniques, one or more speech/audio processing techniques, or one or more natural language processing techniques, wherein the emotion information is indicative of an emotion presented by the human object in the multimedia content. 3. The method of claim 1 further comprising analyzing, by the one or more processors, an audio content within the multimedia content to determine the audio characteristics information by utilizing one or more speech/audio processing techniques, wherein the audio characteristics information comprises one or more of a speech rate, an accent, a speaking style, a background audio, or a background noise. 4. The method of claim 1 , further comprising determining, by the one or more processors, a first textual content from the multimedia content by utilizing one or more text recognition techniques, wherein the first textual content comprises one or more of a textual content located on the inanimate object or a close-captioned text within the multimedia content. 5. The method of claim 4 , further comprising: determining, by the one or more processors, a second textual content from an audio content within the multimedia content, by utilizing one or more speech-to-text conversion techniques. 6. The method of claim 5 further comprising determining, by said one or more processors, the discourse rate associated with the multimedia content based on the first textual content and the second textual content. 7. The method of claim 5 , further comprising extracting, by the one or more processors, one or more keywords from the first textual content and the second textual content by utilizing one or more natural language processing techniques, wherein the one or more keywords relate to the one or more concepts explained in the multimedia content. 8. The method of claim 7 , further comprising determining, by the one or more processors, a number of repetitions of the one or more concepts in the multimedia content. 9. The method of claim 8 , wherein the multimedia content is further indexed based the number of repetitions of the one or more concepts in the multimedia content. 10. The method of claim 7 , further comprising determining, by the one or more processors, the personality type associated with the human object based on one or more of the body language information, the interaction information, the emotion information indicative of an emotion presented by the human object in the multimedia content, a speech rate of said human object, a speaking style of said human object, or the second textual content determined from an audio content within the multimedia content. 11. The method of claim 1 , wherein the multimedia content comprises one or more of an educational lecture, a corporate e-learning module (ELM), or a marketing/promotional video. 12. The method of claim 1 , wherein the inanimate object comprises one or more of a presentation slide, a writing board, a poster, a paper, or a prop/model. 13. The method of claim 12 , wherein the action performed by the human object on the inanimate object includes one or more of: the human object writing on the inanimate object, the human object pointing towards or touching the inanimate object, the human object holding the inanimate object, the human object scrolling through a textual content on the inanimate object, or the human object modifying or highlighting the textual content on the inanimate object. 14. The method of claim 1 , wherein the body language information is determined based on one or more of a hand motions of the human object in the multimedia content, a body motion of the human object, a facial expression/emotion of the human object, a proximity of the human object to a video capturing device utilized for creation of the multimedia content, or an eye contact of the human object towards the video capturing device. 15. A system for indexing a multimedia content within an educational environment, the system comprising: one or more processors configured to: extract a set of frames from the multimedia content, wherein the set of frames comprises at least one of a human object and an inanimate object; determine a body language information pertaining to the human object from the set of frames by utilizing one or more image processing techniques; determine an interaction information from the set of frames, wherein the interaction information is indicative of an action performed by the human object on the inanimate object; and index the multimedia content in a content database based at least on the body language information and the interaction information, wherein the indexing of the multimedia content is further based on emotion information, audio characteristics information, discourse rate, one or more concepts in the multimedia content, number of repetitions of the one or more concepts, and personality type of the human object. 16. The system of claim 15 , wherein the multimedia content includes one or more of an educational lecture, a corporate e-learning module (ELM), or a marketing/promotional video. 17. The system of claim 15 , wherein the inanimate object comprises one or more of a presentation slide, a writing board, a poster, a paper, or a prop/model. 18. The system of claim 17 , wherein the action performed by the human object on the inanimate object includes one or more of: the human object writing on said inanimate object, the human object pointing towards or touching said inanimate object, the human object holding the inanimate object, the human object scrolling through a textual content on the inanimate object, or the human object modifying or highlighting the textual content on the inanimate object. 19. The system of claim 15 , wherein the body language information is determined based on one or more of a hand motions of the human object in the multimedia content, a body motion of the human object, a facial expression/emotion of the human object, a proximity of said human object to a video capturing device utilized for creation of the multimedia content, or an eye contact of said human object towards the video capturing device. 20. A computer program product for use with a computin
Physics · mapped topic
Physics · mapped topic
Facial expression recognition · CPC title
using objects detected or recognised in the video content · CPC title
Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.