Server side hotwording
US-2024412734-A1 · Dec 12, 2024 · US
US9484032B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9484032-B2 |
| Application number | US-201414523966-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 27, 2014 |
| Priority date | Oct 27, 2014 |
| Publication date | Nov 1, 2016 |
| Grant date | Nov 1, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed embodiments illustrate methods and systems for processing multimedia content. The method includes extracting one or more words from an audio stream associated with multimedia content. Each word has associated one or more timestamps indicative of temporal occurrences of said word in said multimedia content. The method further includes creating a word cloud of said one or more words in said multimedia content based on a measure of emphasis laid on each word in said multimedia content and said one or more timestamps associated with said one or more words. The method further includes presenting one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud. Each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content.
Opening claim text (preview).
What is claimed is: 1. A method for processing multimedia content, said method comprising: extracting, by one or more processors, one or more words from at least an audio stream associated with said multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; creating, by said one or more processors, a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words, wherein at least a first dimension of said word cloud corresponds to a measure of temporal spread of each of said one or more words in said multimedia content; and presenting, by said one or more processors, one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 2. The method of claim 1 , wherein said word cloud is a multidimensional graph that includes at least said first dimension and a second dimension. 3. The method of claim 2 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 4. The method of claim 3 , wherein said cumulative temporal occurrence comprises at least one of mean, or median. 5. The method of claim 1 further comprising presenting, by said one or more processors, said word cloud of said one or more words along with said multimedia content to said user. 6. The method of claim 5 further comprising receiving, by said one or more processors, a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 7. The method of claim 6 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 8. The method of claim 7 further comprising generating, by said one or more processors, an audio signal corresponding to said word based on said one or more gestures performed by said user. 9. The method of claim 1 further comprising highlighting, by said one or more processors, one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 10. The method of claim 1 further comprising representing, by said one or more processors, said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 11. The method of claim 1 further comprising normalizing, by said one or more processors, said one or more words extracted from at least said audio stream associated with said multimedia content by text processing. 12. The method of claim 11 , wherein said text processing comprises at least by removing stop words, or by transforming each of said one or more words in said multimedia content to stem form. 13. The method of claim 1 further comprising receiving, by said one or more processors, a second input from said user, wherein said second input corresponds to a selection of a second timestamp on said seek bar associated with said multimedia content, wherein said multimedia content is played from said second timestamp. 14. The method of claim 13 further comprising updating, by said one or more processors, said word cloud based on occurrences of said one or more words in a predefined time-window around said second timestamp. 15. The method of claim 13 , wherein said second input comprises at least said one or more gestures performed by said user on said seek bar associated with said multimedia content. 16. The method of claim 1 further comprising changing, by said one or more processors, font size of said one or more words in said word cloud of said one or more words based on a frequency of occurrences of said one or more words in said multimedia content. 17. A system for processing multimedia content, said system comprising: one or more processors operable to: extract one or more words from at least an audio stream associated with said multimedia content, wherein each of said one or more words has associated one or more timestamps indicative of temporal occurrences of each of said one or more words in said multimedia content; create a word cloud of said one or more words in said multimedia content based at least on a measure of emphasis laid on each of said one or more words in said multimedia content and said one or more timestamps associated with said one or more words, wherein at least a first dimension of said word cloud corresponds to a measure of temporal spread of each of said one or more words in said multimedia content; and present one or more multimedia snippets, of said multimedia content, associated with a word selected by a user from said word cloud, wherein each of said one or more multimedia snippets corresponds to said one or more timestamps associated with occurrences of said word in said multimedia content. 18. The system of claim 17 , wherein said word cloud is a multidimensional graph that includes at least said first dimension and a second dimension. 19. The system of claim 18 , wherein said second dimension corresponds to a measure of a cumulative temporal occurrence of each word in said one or more words, wherein said cumulative temporal occurrence of each word is determined based on said one or more timestamps associated with occurrences of said each word in said multimedia content. 20. The system of claim 19 , wherein said cumulative temporal occurrence comprises at least one of mean, median, or variance. 21. The system of claim 17 , wherein said one or more processors are further operable to present said word cloud of said one or more words in said multimedia content to said user. 22. The system of claim 21 , wherein said one or more processors are further operable to receive a first input from said user based on said presentation of said word cloud of said one or more words, wherein said first input corresponds to a selection of said word from said word cloud of said one or more words. 23. The system of claim 22 , wherein said first input comprises one or more gestures performed by said user on said word in said word cloud. 24. The system of claim 17 , wherein said one or more processors are further operable to highlight one or more portions of a seek bar associated with said multimedia content, wherein said one or more portions correspond to said one or more multimedia snippets. 25. The system of claim 17 , wherein said one or more processors are further operable to represent said measure of emphasis laid on each of said one or more words in said multimedia content by colors in said word cloud. 26. The system of claim 17 , wherein said one or more processors are further operable to receive a second input from said user, wherein said second input corresponds to a selection of
using audio data · CPC title
Transforming into visible information · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Programmed access in sequence to addressed parts of tracks of operating record carriers (access by moving the head G11B3/08, G11B5/54, G11B7/085, G11B21/022; by moving the record carrier G11B15/005, G11B17/005, by driving of both record carrier and head G11B15/1816) · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.