Information processing device, information processing method and program

US9280709B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9280709-B2
Application numberUS-201113814170-A
CountryUS
Kind codeB2
Filing dateAug 2, 2011
Priority dateAug 11, 2010
Publication dateMar 8, 2016
Grant dateMar 8, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content. A feature amount extracting unit 21 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content (for example, a text of a caption) as a text feature amount of the description text. A model learning unit 22 learns an annotation model, which is a multi-stream HMM, by using an annotation sequence for annotation, which is a multi-stream including the image feature amount of each frame and the text feature amount. The present invention may be applied when adding the annotation to the content such as a television broadcast program, for example.

First claim

Opening claim text (preview).

The invention claimed is: 1. An information processing device, comprising: one or more processors configured to: extract an image feature amount of each frame of an image of learning content; extract word frequency information regarding frequency of appearance of each word in a description text describing a content of the image of the learning content as a text feature amount of the description text; learn an annotation model, which is a multi-stream HMM (hidden Markov model), by using an annotation sequence for annotation, which is a multi-stream including the image feature amount and the text feature amount and obtain an inter-state distance from one state to another state of the annotation model such that an error is minimized between i) the inter-state distance and ii) a Euclidean distance from the one state to the another state on a model map on which states of the annotation model are arranged. 2. The information processing device according to claim 1 , wherein the learning content includes a text of a caption, and the description text is the text of the caption included in the learning content. 3. The information processing device according to claim 2 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals, and extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount. 4. The information processing device according to claim 2 , wherein the one or more processors are configured to add an annotation to target content by using the annotation model. 5. The information processing device according to claim 4 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; and select a word with a highest frequency in the multinomial distribution observed in a state corresponding to a target frame out of states of the maximum likelihood state sequence as the annotation to be added to the target frame. 6. The information processing device according to claim 2 , wherein the one or more processors are configured to search a keyword frame from target content from which the keyword frame, which is a frame with a predetermined keyword, is to be searched by using the annotation model. 7. The information processing device according to claim 6 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; and select, when a frequency of the predetermined keyword is highest in the multinomial distribution observed in a state corresponding to a target frame of the target content out of states of the maximum likelihood state sequence, the target frame as the keyword frame. 8. The information processing device according to claim 2 , wherein the one or more processors are configured to display an annotation to be added to a frame of target content to which the annotation is to be added by using the annotation model. 9. The information processing device according to claim 8 , wherein the one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract multinomial distribution, which represents a frequency of appearance of each word in the document, as the text feature amount; extract the image feature amount of each frame of the image of the target content; compose the annotation sequence by using the image feature amount; obtain a state corresponding to each frame of the target content by obtaining a maximum likelihood state sequence in which the annotation sequence is observed in the annotation model; obtain the annotation to be added to the frame corresponding to the state based on the multinomial distribution; and display the annotation to be added to the each frame of the target content corresponding to each state of the annotation model. 10. The information processing device according to claim 9 , wherein the one or more processors are configured to: obtain the inter-state distance from the one state to the another state of the annotation model based on state transition probability from the one state to the another state; obtain a state coordinate, which is a coordinate of a position of a state on the model map; display the model map, on which the corresponding state is arranged at the state coordinate; and display a representative image, which represents the frame corresponding to each state of the annotation model, and the annotation to be added to the frame corresponding to each state of the annotation model on the model map. 11. The information processing device according to claim 2 , wherein the one or more processors are configured to: perform dimension reduction to reduce a dimension of the image feature amount and the text feature amount; and learn the annotation model by using the multi-stream, including the image feature amount and the text feature amount after the dimension reduction, as the annotation sequence. 12. The information processing device according to claim 11 , wherein the one or more processors are configured to: obtain basis space data of a basis space for an image which has a dimension lower than a dimension of the image feature amount for mapping the image feature amount; perform the dimension reduction of the image feature amount based on the basis space data of the basis space; obtain basis space data of a basis space for text of which dimension is lower than a dimension of the text feature amount for mapping the text feature amount; and perform the dimension reduction of the text feature amount based on the basis space data of the basis space for text. 13. The information processing device according to claim 12 , wherein the one or more processors are configured to: obtain a code book used for vector quantization as the basis space data of the basis space for image by using the image feature amount; and obtain a code representing a centroid vector as the image feature amount after the dimension reduction by performing the vector quantization of the image feature amount by using the code book. 14. The information processing device according to claim 12 , wherein one or more processors are configured to: extract words included in the text of the caption displayed in a window as one document while shifting the window of a predetermined time length at regular intervals; extract a freq

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9280709B2 cover?
The present invention relates to an information processing device, an information processing method, and a program capable of easily adding an annotation to content. A feature amount extracting unit 21 extracts an image feature amount of each frame of an image of learning content and extracts word frequency information regarding frequency of appearance of each word in a description text…
Who is the assignee on this patent?
Suzuki Hirotaka, Ito Masato, Sony Corp
What technology area does this patent fall under?
Primary CPC classification G06K9/00718. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).