What technology area does this patent fall under?

Primary CPC classification G06V20/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Semantic natural language vector space

US9792534B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9792534-B2
Application number	US-201614995042-A
Country	US
Kind code	B2
Filing date	Jan 13, 2016
Priority date	Jan 13, 2016
Publication date	Oct 17, 2017
Grant date	Oct 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for image captioning with word vector representations are described. In implementations, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space. These word vector representations reflect distance values in the context of the semantic word vector space. In this approach, words are mapped into a vector space and the results of caption analysis are expressed as points in the vector space that capture semantics between words. In the vector space, similar concepts with have small distance values. The word vectors are not tied to particular words or a single dictionary. A post-processing step is employed to map the points to words and convert the word vector representations to captions. Accordingly, conversion is delayed to a later stage in the process.

First claim

Opening claim text (preview).

What is claimed is: 1. In a digital media environment to facilitate management of image collections using one or more computing devices, a method to automatically generate image captions using word vector representations comprising: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to a caption generator to initiate caption generation; and outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes, the word vector usable to generate a corresponding caption. 2. The method as described in claim 1 , further comprising converting the word vector into a caption for the target image as a post-processing operation. 3. The method as described in claim 2 , wherein converting the word vector into a caption for the target image comprises selecting a dictionary and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 4. The method as described in claim 1 , wherein the caption generator is configured to generate word vectors as intermediate results of caption analysis. 5. The method of claim 1 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 6. The method of claim 1 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 7. The method of claim 6 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences. 8. The method of claim 6 , wherein word vector conversion is delayed to a post-processing operation performed after operations of the RNN occur to output the word vector. 9. The method of claim 6 , wherein the word vector conversion occurs in the context of a dictionary selected outside of the caption analysis performed via the RNN. 10. The method of claim 1 , wherein the word vector is usable to generate a corresponding caption with multiple different dictionaries selected after the word vector is generated. 11. In a digital media environment to facilitate access to collections of images using one or more computing devices, a system comprising; one or more processing devices; one or more computer-readable media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using word vector representations including: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to the caption generator to initiate caption generation; outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes; and subsequently using the word vector in post-processing operations to generate a corresponding caption by: selecting a dictionary; and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 12. A system as recited in claim 11 , wherein outputting the word vector in the semantic word vector space enables changing of the selected dictionary for different contexts. 13. A system as recited in claim 11 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 14. A system as recited in claim 11 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 15. A system as recited in claim 14 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences. 16. One or more non-transitory computer-readable storage media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using word vector representations including: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to the caption generator to initiate caption generation; outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes; and subsequently using the word vector in post-processing operations to generate a corresponding caption by: selecting a dictionary; and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 17. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein outputting the word vector in the semantic word vector space enables changing of the selected dictionary for different contexts. 18. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 19. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 20. One or more non-transitory computer-readable storage media as recited in claim 19 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences.

Assignees

Adobe Systems Inc

Inventors

Classifications

G06V20/70Primary
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06F18/2411
based on the proximity to a decision surface, e.g. support vector machines · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 59276189

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9792534B2 cover?: Techniques for image captioning with word vector representations are described. In implementations, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space. These word vector representations reflect distance values in the context of the semantic word vector space. In this approach, words are mapped into a vector space…
Who is the assignee on this patent?: Adobe Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image Captioning with Weak Supervision

Latent embeddings for word images and their semantics

Analyzer for behavioral analysis and parameterization of neural stimulation

Information processing device, information processing method and program

Frequently asked questions