Methods and systems for processing a multimedia content

US2016118060A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016118060-A1
Application numberUS-201414523966-A
CountryUS
Kind codeA1
Filing dateOct 27, 2014
Priority dateOct 27, 2014
Publication dateApr 28, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing a multimedia content, said method comprising: extracting, by one or more processors, one or more words from at least an audio stream associated with a multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; creating, by said one or more processors, a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words; and presenting, by said one or more processors, one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 2 . The method of claim 1 , wherein said word cloud is a multidimensional graph that includes at least a first dimension and a second dimension. 3 . The method of claim 2 , wherein said first dimension corresponds to a measure of a temporal spread of each of said one or more words in said multimedia content. 4 . The method of claim 2 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 5 . The method of claim 4 , wherein said cumulative temporal occurrence comprises at least one of mean, or median. 6 . The method of claim 1 further comprising presenting, by said one or more processors, said word cloud of said one or more words along with said multimedia content to said user. 7 . The method of claim 6 further comprising receiving, by said one or more processors, a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 8 . The method of claim 7 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 9 . The method of claim 8 further comprising generating, by said one or more processors, an audio signal corresponding to said word based on said one or more gestures performed by said user. 10 . The method of claim 1 further comprising highlighting, by said one or more processors, one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 11 . The method of claim 1 further comprising representing, by said one or more processors, said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 12 . The method of claim 1 further comprising normalizing, by said one or more processors, said one or more words extracted from at least said audio stream associated with said multimedia content by text processing. 13 . The method of claim 12 , wherein said text processing comprises at least by removing stop words, or by transforming each of said one or more words in said multimedia content to stem form. 14 . The method of claim 1 further comprising receiving, by said one or more processors, a second input from said user, wherein said second input corresponds to a selection of a second timestamp on said seek bar associated with said multimedia content, wherein said multimedia content is played from said second timestamp. 15 . The method of claim 14 further comprising updating, by said one or more processors, said word cloud based on occurrences of said one or more words in a predefined time-window around said second timestamp. 16 . The method of claim 14 , wherein said second input comprises at least said one or more gestures performed by said user on said seek bar associated with said multimedia content. 17 . The method of claim 1 further comprising changing, by said one or more processors, font size of said one or more words in said word cloud of said one or more words based on a frequency of occurrences of said one or more words in said multimedia content. 18 . A system for processing a multimedia content, said system comprising: one or more processors operable to: extract one or more words from at least an audio stream associated with a multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; create a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words; and present one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 19 . The system of claim 18 , wherein said word cloud is a multidimensional graph that includes at least a first dimension and a second dimension. 20 . The system of claim 19 , wherein said first dimension corresponds to a measure of a temporal spread of each of said one or more words in said multimedia content. 21 . The system of claim 19 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 22 . The system of claim 21 , wherein said cumulative temporal occurrence comprises at least one of mean, median, or variance. 23 . The system of claim 18 , wherein said one or more processors are further operable to present said word cloud of said one or more words in said multimedia content to said user. 24 . The system of claim 23 , wherein said one or more processors are further operable to receive a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 25 . The system of claim 24 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 26 . The system of claim 18 , wherein said one or more processors are further operable to highlight one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 27 . The system of claim 18 , wherein said one or more processors are further operable to represent said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 28 . The system of claim 18 , wherein said one or more p

Assignees

Inventors

Classifications

  • Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Indicating arrangements  {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title

  • using prosody or stress · CPC title

  • Transforming into visible information · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016118060A1 cover?
The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).