Methods and systems for navigating through multimedia content

US9484032B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9484032-B2
Application numberUS-201414523966-A
CountryUS
Kind codeB2
Filing dateOct 27, 2014
Priority dateOct 27, 2014
Publication dateNov 1, 2016
Grant dateNov 1, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for processing multimedia content, said method comprising: extracting, by one or more processors, one or more words from at least an audio stream associated with said multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; creating, by said one or more processors, a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words, wherein at least a first dimension of said word cloud corresponds to a measure of temporal spread of each of said one or more words in said multimedia content; and presenting, by said one or more processors, one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 2. The method of claim 1 , wherein said word cloud is a multidimensional graph that includes at least said first dimension and a second dimension. 3. The method of claim 2 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 4. The method of claim 3 , wherein said cumulative temporal occurrence comprises at least one of mean, or median. 5. The method of claim 1 further comprising presenting, by said one or more processors, said word cloud of said one or more words along with said multimedia content to said user. 6. The method of claim 5 further comprising receiving, by said one or more processors, a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 7. The method of claim 6 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 8. The method of claim 7 further comprising generating, by said one or more processors, an audio signal corresponding to said word based on said one or more gestures performed by said user. 9. The method of claim 1 further comprising highlighting, by said one or more processors, one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 10. The method of claim 1 further comprising representing, by said one or more processors, said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 11. The method of claim 1 further comprising normalizing, by said one or more processors, said one or more words extracted from at least said audio stream associated with said multimedia content by text processing. 12. The method of claim 11 , wherein said text processing comprises at least by removing stop words, or by transforming each of said one or more words in said multimedia content to stem form. 13. The method of claim 1 further comprising receiving, by said one or more processors, a second input from said user, wherein said second input corresponds to a selection of a second timestamp on said seek bar associated with said multimedia content, wherein said multimedia content is played from said second timestamp. 14. The method of claim 13 further comprising updating, by said one or more processors, said word cloud based on occurrences of said one or more words in a predefined time-window around said second timestamp. 15. The method of claim 13 , wherein said second input comprises at least said one or more gestures performed by said user on said seek bar associated with said multimedia content. 16. The method of claim 1 further comprising changing, by said one or more processors, font size of said one or more words in said word cloud of said one or more words based on a frequency of occurrences of said one or more words in said multimedia content. 17. A system for processing multimedia content, said system comprising: one or more processors operable to: extract one or more words from at least an audio stream associated with said multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; create a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words, wherein at least a first dimension of said word cloud corresponds to a measure of temporal spread of each of said one or more words in said multimedia content; and present one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 18. The system of claim 17 , wherein said word cloud is a multidimensional graph that includes at least said first dimension and a second dimension. 19. The system of claim 18 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 20. The system of claim 19 , wherein said cumulative temporal occurrence comprises at least one of mean, median, or variance. 21. The system of claim 17 , wherein said one or more processors are further operable to present said word cloud of said one or more words in said multimedia content to said user. 22. The system of claim 21 , wherein said one or more processors are further operable to receive a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 23. The system of claim 22 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 24. The system of claim 17 , wherein said one or more processors are further operable to highlight one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 25. The system of claim 17 , wherein said one or more processors are further operable to represent said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 26. The system of claim 17 , wherein said one or more processors are further operable to receive a second input from said user, wherein said second input corresponds to a selection of

Assignees

Inventors

Classifications

  • using audio data · CPC title

  • Transforming into visible information · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Programmed access in sequence to addressed parts of tracks of operating record carriers (access by moving the head G11B3/08, G11B5/54, G11B7/085, G11B21/022; by moving the record carrier G11B15/005, G11B17/005, by driving of both record carrier and head G11B15/1816) · CPC title

  • Word spotting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9484032B2 cover?
The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 01 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).