Semantic Natural Language Vector Space
US-2017200066-A1 · Jul 13, 2017 · US
US9811765B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9811765-B2 |
| Application number | US-201614995032-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 13, 2016 |
| Priority date | Jan 13, 2016 |
| Publication date | Nov 7, 2017 |
| Grant date | Nov 7, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for image captioning with weak supervision are described herein. In implementations, weak supervision data regarding a target image is obtained and utilized to provide detail information that supplements global image concepts derived for image captioning. Weak supervision data refers to noisy data that is not closely curated and may include errors. Given a target image, weak supervision data for visually similar images may be collected from sources of weakly annotated images, such as online social networks. Generally, images posted online include “weak” annotations in the form of tags, titles, labels, and short descriptions added by users. Weak supervision data for the target image is generated by extracting keywords for visually similar images discovered in the different sources. The keywords included in the weak supervision data are then employed to modulate weights applied for probabilistic classifications during image captioning analysis.
Opening claim text (preview).
What is claimed is: 1. In a digital media environment to facilitate management of image collections using at least one computing device, a method to automatically generate image captions using weak supervision data comprising: obtaining, by the at least one computing device, a target image for caption analysis; applying, by the at least one computing device, feature extraction to the target image to generate global concepts corresponding to the image; comparing, by the at least one computing device, the target image to images from a source of weakly annotated images to identify visually similar images; building, by the at least one computing device, a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images; and supplying, by the at least one computing device, the collection of keywords indicative of image details as the weak supervision data for caption generation along with the global concepts. 2. The method as described in claim 1 , further comprising generating a caption for the target image using the collection of keywords to modulate word weights applied for sentence construction. 3. The method as described in claim 1 , wherein the collection of keywords expands a set of candidate captions available for the caption analysis to include specific objects, attributes, and terms derived from the weak supervision data in addition to the global concepts derived from the feature extraction. 4. The method as described in claim 1 , wherein the collection of keywords is supplied to a language processing model operable to probabilistically generate a descriptive caption for the image by computing probability distributions that account for the weak supervision data. 5. The method of claim 1 , wherein applying feature extraction to the target image comprises using a pre-trained convolution neural network (CNN) to encode the image with global descriptive terms indicative of the global concepts. 6. The method of claim 1 , wherein supplying the collection of keywords comprises providing keywords to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 7. The method of claim 6 , wherein the RNN iteratively predicts a sequence of words to combine as the caption for the target image based upon probability distributions computed in accordance with weight factors in multiple iterations. 8. The method of claim 7 , wherein the collection of keywords is injected in the RNN for each of the multiple iterations to modulate the weight factors used to predict the sequence. 9. The method of claim 1 , wherein caption generation includes multiple iterations to determine a sequence of words to combine as the caption for the target image and supplying the collection of keywords comprises providing the same keywords for each of the multiple iterations. 10. The method of claim 1 , wherein building the collection of keywords comprises scoring and ranking keywords associated with the visually similar images based on relevance criteria and generating a filtered list of top ranking keywords. 11. The method as described in claim 1 , wherein keywords in the collection of keywords are assigned keyword weights effective to change word probabilities in probabilistic categorization implemented for caption generation to favor keywords indicative of the image details. 12. The method as described in claim 1 , wherein the source of weakly annotated images comprises an online repository for images accessible over a network. 13. In a digital media environment to facilitate access to collections of images using one or more computing devices, a system comprising; one or more processing devices; one or more computer-readable media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using weak supervision data including: processing a target image for caption analysis via a convolution neural network (CNN), the CNN configured to extract global concepts corresponding to the target image; comparing the target image to images from at least one source of weakly annotated images to identify visually similar images; building a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images as weak supervision data used to inform caption generation; supplying the collection of keywords indicative of image details to a recurrent neural network (RNN) along with the global concepts, the RNN configured to implement language modeling and sentence construction techniques for generating a caption for the target image; and generating the caption for the target image via the RNN using the collection of keywords to modulate word weights applied by the RNN for sentence construction. 14. A system as recited in claim 13 , wherein the at least one source of weakly annotated images includes a social networking site having a database of images associated by users with weak annotations indicative of low-level image details. 15. A system as recited in claim 13 , wherein the at least one source of weakly annotated images includes a collection of training images used to train the caption generator. 16. A system as recited in claim 13 , wherein: the RNN iteratively predicts a sequence of words to combine as the caption for the target image based upon probability distributions computed in accordance with weight factors in multiple iterations; and the same collection of keywords derived from the weak supervision data is injected in the RNN for each of the multiple iterations to modulate the weight factors used to predict the sequence. 17. In a digital media environment to facilitate management of image collections using at least one computing device, a method to automatically generate image captions implemented via an image service comprising: comparing, by the at least one computing device, a target image for caption analysis to images from at least one source of weakly annotated images to identify visually similar images; building, by the at least one computing device, a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images as weak supervision data used to inform caption generation; supplying, by the at least one computing device, the collection of keywords indicative of the image details to a caption generation model configured to iteratively combine words derived from the concepts and attributes to construct a caption in multiple iterations; and constructing, by the at least one computing device, the caption according to a semantic attention model configured to modulate weights assigned to the keywords for each of the multiple iterations based on relevance to a word predicted in a preceding iteration. 18. The method as described in claim 17 , wherein the semantic attention model causes different keywords to be considered at each of the multiple iterations. 19. The method as described in claim 18 , wherein the caption generation model comprises a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating the caption for the target image. 20. The method as described in claim 19 , wherein the semantic attention model includes an input attention model applied to input for each node
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
based on the proximity to a decision surface, e.g. support vector machines · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.