Method and system for cluster-based video monitoring and event categorization
US-9213903-B1 · Dec 15, 2015 · US
US9280709B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9280709-B2 |
| Application number | US-201113814170-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 2, 2011 |
| Priority date | Aug 11, 2010 |
| Publication date | Mar 8, 2016 |
| Grant date | Mar 8, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content. A feature amount extracting unit 21 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content (for example, a text of a caption) as a text feature amount of the description text. A model learning unit 22 learns an annotation model, which is a multi-stream HMM, by using an annotation sequence for annotation, which is a multi-stream including the image feature amount of each frame and the text feature amount. The present invention may be applied when adding the annotation to the content such as a television broadcast program, for example.
Opening claim text (preview).
The invention claimed is: 1. An information processing device, comprising: one or more processors configured to: extract an image feature amount of each frame of an image of learning content; extract word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learn an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount and obtain an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged. 2. The information processing device according to claim 1 , wherein the learning content includes a text of a caption, and the description text is the text of the caption included in the learning content. 3. The information processing device according to claim 2 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals, and extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount. 4. The information processing device according to claim 2 , wherein the one or more processors are configured to add an annotation to target content by using the annotation model. 5. The information processing device according to claim 4 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; and select a word with a highest frequency in the multinomial distribution observed in a state corresponding to a target frame out of states of the maximum likelihood state sequence as the annotation to be added to the target frame. 6. The information processing device according to claim 2 , wherein the one or more processors are configured to search a keyword frame from target content from which the keyword frame, which is a frame with a predetermined keyword, is to be searched by using the annotation model. 7. The information processing device according to claim 6 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; and select, when a frequency of the predetermined keyword is highest in the multinomial distribution observed in a state corresponding to a target frame of the target content out of states of the maximum likelihood state sequence, the target frame as the keyword frame. 8. The information processing device according to claim 2 , wherein the one or more processors are configured to display an annotation to be added to a frame of target content to which the annotation is to be added by using the annotation model. 9. The information processing device according to claim 8 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a state corresponding to each frame of the target content by obtaining a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; obtain the annotation to be added to the frame corresponding to the state based on the multinomial distribution; and display the annotation to be added to the each frame of the target content corresponding to each state of the annotation model. 10. The information processing device according to claim 9 , wherein the one or more processors are configured to: obtain the inter-state distance from the one state to the another state of the annotation model based on state transition probability from the one state to the another state; obtain a state coordinate, which is a coordinate of a position of a state on the model map; display the model map, on which the corresponding state is arranged at the state coordinate; and display a representative image, which represents the frame corresponding to each state of the annotation model, and the annotation to be added to the frame corresponding to each state of the annotation model on the model map. 11. The information processing device according to claim 2 , wherein the one or more processors are configured to: perform dimension reduction to reduce a dimension of the image feature amount and the text feature amount; and learn the annotation model by using the multi-stream, including the image feature amount and the text feature amount after the dimension reduction, as the annotation sequence. 12. The information processing device according to claim 11 , wherein the one or more processors are configured to: obtain basis space data of a basis space for an image which has a dimension lower than a dimension of the image feature amount for mapping the image feature amount; perform the dimension reduction of the image feature amount based on the basis space data of the basis space; obtain basis space data of a basis space for text of which dimension is lower than a dimension of the text feature amount for mapping the text feature amount; and perform the dimension reduction of the text feature amount based on the basis space data of the basis space for text. 13. The information processing device according to claim 12 , wherein the one or more processors are configured to: obtain a code book used for vector quantization as the basis space data of the basis space for image by using the image feature amount; and obtain a code representing a centroid vector as the image feature amount after the dimension reduction by performing the vector quantization of the image feature amount by using the code book. 14. The information processing device according to claim 12 , wherein one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract a freq
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.