Semantic natural language vector space

US9792534B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9792534-B2
Application numberUS-201614995042-A
CountryUS
Kind codeB2
Filing dateJan 13, 2016
Priority dateJan 13, 2016
Publication dateOct 17, 2017
Grant dateOct 17, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for image captioning with word vector representations are described. In implementations, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space. These word vector representations reflect distance values in the context of the semantic word vector space. In this approach, words are mapped into a vector space and the results of caption analysis are expressed as points in the vector space that capture semantics between words. In the vector space, similar concepts with have small distance values. The word vectors are not tied to particular words or a single dictionary. A post-processing step is employed to map the points to words and convert the word vector representations to captions. Accordingly, conversion is delayed to a later stage in the process.

First claim

Opening claim text (preview).

What is claimed is: 1. In a digital media environment to facilitate management of image collections using one or more computing devices, a method to automatically generate image captions using word vector representations comprising: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to a caption generator to initiate caption generation; and outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes, the word vector usable to generate a corresponding caption. 2. The method as described in claim 1 , further comprising converting the word vector into a caption for the target image as a post-processing operation. 3. The method as described in claim 2 , wherein converting the word vector into a caption for the target image comprises selecting a dictionary and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 4. The method as described in claim 1 , wherein the caption generator is configured to generate word vectors as intermediate results of caption analysis. 5. The method of claim 1 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 6. The method of claim 1 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 7. The method of claim 6 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences. 8. The method of claim 6 , wherein word vector conversion is delayed to a post-processing operation performed after operations of the RNN occur to output the word vector. 9. The method of claim 6 , wherein the word vector conversion occurs in the context of a dictionary selected outside of the caption analysis performed via the RNN. 10. The method of claim 1 , wherein the word vector is usable to generate a corresponding caption with multiple different dictionaries selected after the word vector is generated. 11. In a digital media environment to facilitate access to collections of images using one or more computing devices, a system comprising; one or more processing devices; one or more computer-readable media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using word vector representations including: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to the caption generator to initiate caption generation; outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes; and subsequently using the word vector in post-processing operations to generate a corresponding caption by: selecting a dictionary; and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 12. A system as recited in claim 11 , wherein outputting the word vector in the semantic word vector space enables changing of the selected dictionary for different contexts. 13. A system as recited in claim 11 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 14. A system as recited in claim 11 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 15. A system as recited in claim 14 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences. 16. One or more non-transitory computer-readable storage media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using word vector representations including: obtaining a target image for caption analysis; applying feature extraction to the target image to generate attributes corresponding to the image; supplying the attributes to the caption generator to initiate caption generation; outputting by the caption generator a word vector in a semantic word vector space indicative of semantic relationships between words in sentences formed as a combination of the attributes; and subsequently using the word vector in post-processing operations to generate a corresponding caption by: selecting a dictionary; and mapping the word vector to words in the semantic word vector space based on the selected dictionary. 17. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein outputting the word vector in the semantic word vector space enables changing of the selected dictionary for different contexts. 18. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein the feature extraction is implemented using a pre-trained convolution neural network (CNN) to encode the image with keywords indicative of the attributes. 19. One or more non-transitory computer-readable storage media as recited in claim 16 , wherein supplying the attributes to a caption generator to initiate caption generation comprises providing the attributes to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 20. One or more non-transitory computer-readable storage media as recited in claim 19 , wherein an objective function implemented by the RNN is adapted to consider distances in the semantic word vector space instead of probability distributions for word sequences.

Assignees

Inventors

Classifications

  • G06V20/70Primary

    Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • based on the proximity to a decision surface, e.g. support vector machines · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9792534B2 cover?
Techniques for image captioning with word vector representations are described. In implementations, instead of outputting results of caption analysis directly, the framework is adapted to output points in a semantic word vector space. These word vector representations reflect distance values in the context of the semantic word vector space. In this approach, words are mapped into a vector space…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).