Method and apparatus for providing user interface for video retrieval
US-12045281-B2 · Jul 23, 2024 · US
US9483557B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9483557-B2 |
| Application number | US-201113040640-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 4, 2011 |
| Priority date | Mar 4, 2011 |
| Publication date | Nov 1, 2016 |
| Grant date | Nov 1, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various embodiments, a transcript that represents a media file is created. Keyword candidates that may represent topics and/or content associated with the media content are then be extracted from the transcript. Furthermore, a keyword set may be generated for the media content utilizing a mutual information criteria. In other embodiments, one or more queries may be generated based at least in part on the transcript, and a plurality of web documents may be retrieved based at least in part on the one or more queries. Additional keyword candidates may be extracted from each web document and then ranked. A subset of the keyword candidates may then be selected to form a keyword set associated with the media content.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: generating a transcript from media content using automatic speech recognition; extracting transcript keyword candidates from the transcript once the transcript is generated from the media content; determining that a first subset of the extracted transcript keyword candidates relates to a primary topic of the media content and a second subset of the extracted transcript keyword candidates relates to a second topic of the media content, wherein the primary topic is different from the second topic; generating web queries from the transcript keyword candidates and submitting the web queries to a search engine; accumulating results from the web queries to form a set of web documents and extracting web document keyword candidates from the set of web documents; determining mutual information criteria for individual ones of the extracted transcript keyword candidates based at least in part on a number of documents from the set of web documents that contain a co-occurrence of an extracted transcript keyword candidate with a web document keyword candidate, and a total number of documents of the set of web documents; ranking, based at least in part on the mutual information criteria, the extracted transcript keyword candidates to generate ranked transcript keyword candidates, wherein the first subset of the extracted transcript keyword candidates related to the primary topic of the media content are ranked higher than the second subset of the extracted transcript keyword candidates related to the second topic of the media content; selecting one or more of the extracted transcript keyword candidates based at least in part on the ranked transcript keyword candidates to form a keyword set; and associating the keyword set with the media content. 2. A method as recited in claim 1 , further comprising associating the keyword set with the media content such that the keyword set is presented with the media content, the keyword set including keywords that represent the primary topic and the second topic associated with the media content. 3. A method as recited in claim 1 , wherein the keyword set includes one or more other words or phrases in addition to words or phrases included in the transcript. 4. A method as recited in claim 1 , further comprising presenting the keyword set to a user before rendering the media content. 5. A method as recited in claim 1 , wherein selecting the one or more of the extracted transcript keyword candidates further comprises identifying a predetermined number of top-ranked transcript keyword candidates from the extracted transcript keyword candidates and associating the top- ranked transcript keyword candidates with the media content. 6. A method as recited in claim 1 , further comprising indexing the media content based at least in part on the selected one or more extracted transcript keyword candidates included in the keyword set. 7. A method comprising: generating a transcript from media content using automatic speech recognition; extracting transcript keyword candidates from the transcript once the transcript is generated from the media content; generating web queries from the extracted transcript keyword candidates and submitting the web queries to a search engine; accumulating results from the web queries to form a set of web documents and extracting web document keyword candidates from the set of web documents; determining mutual information criteria for individual ones of the extracted transcript keyword candidates based at least in part on a number of documents from the set of web documents that contain a co-occurrence of an extracted transcript keyword candidate with an extracted web document keyword candidate, and a total number of documents of the set of web documents; ranking, based at least in part on the mutual information criteria, the extracted transcript keyword candidates to generate ranked extracted transcript keyword candidates; and selecting one or more of the extracted transcript keyword candidates based at least in part on the ranked extracted transcript keyword candidates to form a keyword set; and associating the keyword set with the media content. 8. A method as recited in claim 7 , wherein the media content does not include any associated keywords prior to the transcript keyword candidates being extracted. 9. A method as recited in claim 7 , wherein associating the keyword set with the media content comprises presenting the keyword set with the media content, the keyword set including keywords that represent one or more topics associated with the media content. 10. A method as recited in claim 7 , wherein ranking the extracted transcript keyword candidates further comprises ranking the extracted transcript keyword candidates based on respective relevance of the extracted transcript keyword candidates with respect to the media content. 11. A method as recited in claim 7 , further comprising pruning at least one of the selected one or more transcript keyword candidates from the keyword set based at least in part on whether the at least one of the selected one or more transcript keyword candidates is below a relatedness threshold. 12. A system comprising: one or more processors; one or more storage devices storing modules that are executable by the one or more processors, the modules including: a speech recognizer component that generates a transcript based on speech and non-speech included in media content; an extraction component that extracts transcript keyword candidates from the transcript once the transcript is generated from the media content by the speech recognizer component; a keyword collection component that: accumulates search results from web queries formed from the extracted transcript keyword candidates submitted to a search engine; forms a set of web documents from the search results; and extracts web document keyword candidates from the set of web documents; and a keyword selector component that: determines mutual information criteria for individual ones of the extracted transcript keyword candidates based at least in part on a number of documents from the set of web documents that contain a co-occurrence of an extracted transcript keyword candidate with a web document keyword candidate, and a total number of documents of the set of web documents; ranks the extracted transcript keyword candidates, based at least in part on the mutual information criteria, to form a ranking; and selects one or more of the extracted transcript keyword candidates based at least in part on the ranking to form a keyword set that is to be associated with the media content. 13. A system as recited in claim 12 , wherein: at least the web queries include a phrase including multiple words from the extracted transcript keyword candidates; and the extracted transcript keyword candidates are identified based at least in part on meta information associated with the web documents or text in a body of the web documents. 14. A system as recited in claim 13 , wherein the meta information associated with the web documents comprises manually generated keyword lists, and the manually generated keyword lists are used as a constraint for selecting the one or more of the extracted transcript keyword candidates.
Indexing; Web crawling techniques · CPC title
using original textual content or text extracted from visual content or transcript of audio data · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.