What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Apr 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and systems for processing a multimedia content

US2016118060A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2016118060-A1
Application number	US-201414523966-A
Country	US
Kind code	A1
Filing date	Oct 27, 2014
Priority date	Oct 27, 2014
Publication date	Apr 28, 2016
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for processing a multimedia content, said method comprising: extracting, by one or more processors, one or more words from at least an audio stream associated with a multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; creating, by said one or more processors, a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words; and presenting, by said one or more processors, one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 2 . The method of claim 1 , wherein said word cloud is a multidimensional graph that includes at least a first dimension and a second dimension. 3 . The method of claim 2 , wherein said first dimension corresponds to a measure of a temporal spread of each of said one or more words in said multimedia content. 4 . The method of claim 2 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 5 . The method of claim 4 , wherein said cumulative temporal occurrence comprises at least one of mean, or median. 6 . The method of claim 1 further comprising presenting, by said one or more processors, said word cloud of said one or more words along with said multimedia content to said user. 7 . The method of claim 6 further comprising receiving, by said one or more processors, a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 8 . The method of claim 7 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 9 . The method of claim 8 further comprising generating, by said one or more processors, an audio signal corresponding to said word based on said one or more gestures performed by said user. 10 . The method of claim 1 further comprising highlighting, by said one or more processors, one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 11 . The method of claim 1 further comprising representing, by said one or more processors, said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 12 . The method of claim 1 further comprising normalizing, by said one or more processors, said one or more words extracted from at least said audio stream associated with said multimedia content by text processing. 13 . The method of claim 12 , wherein said text processing comprises at least by removing stop words, or by transforming each of said one or more words in said multimedia content to stem form. 14 . The method of claim 1 further comprising receiving, by said one or more processors, a second input from said user, wherein said second input corresponds to a selection of a second timestamp on said seek bar associated with said multimedia content, wherein said multimedia content is played from said second timestamp. 15 . The method of claim 14 further comprising updating, by said one or more processors, said word cloud based on occurrences of said one or more words in a predefined time-window around said second timestamp. 16 . The method of claim 14 , wherein said second input comprises at least said one or more gestures performed by said user on said seek bar associated with said multimedia content. 17 . The method of claim 1 further comprising changing, by said one or more processors, font size of said one or more words in said word cloud of said one or more words based on a frequency of occurrences of said one or more words in said multimedia content. 18 . A system for processing a multimedia content, said system comprising: one or more processors operable to: extract one or more words from at least an audio stream associated with a multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; create a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words; and present one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 19 . The system of claim 18 , wherein said word cloud is a multidimensional graph that includes at least a first dimension and a second dimension. 20 . The system of claim 19 , wherein said first dimension corresponds to a measure of a temporal spread of each of said one or more words in said multimedia content. 21 . The system of claim 19 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 22 . The system of claim 21 , wherein said cumulative temporal occurrence comprises at least one of mean, median, or variance. 23 . The system of claim 18 , wherein said one or more processors are further operable to present said word cloud of said one or more words in said multimedia content to said user. 24 . The system of claim 23 , wherein said one or more processors are further operable to receive a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 25 . The system of claim 24 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 26 . The system of claim 18 , wherein said one or more processors are further operable to highlight one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 27 . The system of claim 18 , wherein said one or more processors are further operable to represent said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 28 . The system of claim 18 , wherein said one or more p

Assignees

Xerox Corp

Inventors

Classifications

G06F3/04842
Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title
G11B27/34
Indicating arrangements {(indicating means incorporated in magazine or cassette G11B23/046 and G11B23/0875; indicating measured values in general G01D)} · CPC title
G10L15/1807
using prosody or stress · CPC title
G10L21/10
Transforming into visible information · CPC title

Patent family

Related publications grouped by family.

View patent family 55792475

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016118060A1 cover?: The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words…
Who is the assignee on this patent?: Xerox Corp
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Apr 28 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).