What technology area does this patent fall under?

Primary CPC classification G06V20/70. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Image captioning with weak supervision

US9811765B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9811765-B2
Application number	US-201614995032-A
Country	US
Kind code	B2
Filing date	Jan 13, 2016
Priority date	Jan 13, 2016
Publication date	Nov 7, 2017
Grant date	Nov 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for image captioning with weak supervision are described herein. In implementations, weak supervision data regarding a target image is obtained and utilized to provide detail information that supplements global image concepts derived for image captioning. Weak supervision data refers to noisy data that is not closely curated and may include errors. Given a target image, weak supervision data for visually similar images may be collected from sources of weakly annotated images, such as online social networks. Generally, images posted online include “weak” annotations in the form of tags, titles, labels, and short descriptions added by users. Weak supervision data for the target image is generated by extracting keywords for visually similar images discovered in the different sources. The keywords included in the weak supervision data are then employed to modulate weights applied for probabilistic classifications during image captioning analysis.

First claim

Opening claim text (preview).

What is claimed is: 1. In a digital media environment to facilitate management of image collections using at least one computing device, a method to automatically generate image captions using weak supervision data comprising: obtaining, by the at least one computing device, a target image for caption analysis; applying, by the at least one computing device, feature extraction to the target image to generate global concepts corresponding to the image; comparing, by the at least one computing device, the target image to images from a source of weakly annotated images to identify visually similar images; building, by the at least one computing device, a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images; and supplying, by the at least one computing device, the collection of keywords indicative of image details as the weak supervision data for caption generation along with the global concepts. 2. The method as described in claim 1 , further comprising generating a caption for the target image using the collection of keywords to modulate word weights applied for sentence construction. 3. The method as described in claim 1 , wherein the collection of keywords expands a set of candidate captions available for the caption analysis to include specific objects, attributes, and terms derived from the weak supervision data in addition to the global concepts derived from the feature extraction. 4. The method as described in claim 1 , wherein the collection of keywords is supplied to a language processing model operable to probabilistically generate a descriptive caption for the image by computing probability distributions that account for the weak supervision data. 5. The method of claim 1 , wherein applying feature extraction to the target image comprises using a pre-trained convolution neural network (CNN) to encode the image with global descriptive terms indicative of the global concepts. 6. The method of claim 1 , wherein supplying the collection of keywords comprises providing keywords to a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating a caption for the target image. 7. The method of claim 6 , wherein the RNN iteratively predicts a sequence of words to combine as the caption for the target image based upon probability distributions computed in accordance with weight factors in multiple iterations. 8. The method of claim 7 , wherein the collection of keywords is injected in the RNN for each of the multiple iterations to modulate the weight factors used to predict the sequence. 9. The method of claim 1 , wherein caption generation includes multiple iterations to determine a sequence of words to combine as the caption for the target image and supplying the collection of keywords comprises providing the same keywords for each of the multiple iterations. 10. The method of claim 1 , wherein building the collection of keywords comprises scoring and ranking keywords associated with the visually similar images based on relevance criteria and generating a filtered list of top ranking keywords. 11. The method as described in claim 1 , wherein keywords in the collection of keywords are assigned keyword weights effective to change word probabilities in probabilistic categorization implemented for caption generation to favor keywords indicative of the image details. 12. The method as described in claim 1 , wherein the source of weakly annotated images comprises an online repository for images accessible over a network. 13. In a digital media environment to facilitate access to collections of images using one or more computing devices, a system comprising; one or more processing devices; one or more computer-readable media storing instructions executable via the one or more processing devices to implement a caption generator configured to perform operations to automatically generate image captions using weak supervision data including: processing a target image for caption analysis via a convolution neural network (CNN), the CNN configured to extract global concepts corresponding to the target image; comparing the target image to images from at least one source of weakly annotated images to identify visually similar images; building a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images as weak supervision data used to inform caption generation; supplying the collection of keywords indicative of image details to a recurrent neural network (RNN) along with the global concepts, the RNN configured to implement language modeling and sentence construction techniques for generating a caption for the target image; and generating the caption for the target image via the RNN using the collection of keywords to modulate word weights applied by the RNN for sentence construction. 14. A system as recited in claim 13 , wherein the at least one source of weakly annotated images includes a social networking site having a database of images associated by users with weak annotations indicative of low-level image details. 15. A system as recited in claim 13 , wherein the at least one source of weakly annotated images includes a collection of training images used to train the caption generator. 16. A system as recited in claim 13 , wherein: the RNN iteratively predicts a sequence of words to combine as the caption for the target image based upon probability distributions computed in accordance with weight factors in multiple iterations; and the same collection of keywords derived from the weak supervision data is injected in the RNN for each of the multiple iterations to modulate the weight factors used to predict the sequence. 17. In a digital media environment to facilitate management of image collections using at least one computing device, a method to automatically generate image captions implemented via an image service comprising: comparing, by the at least one computing device, a target image for caption analysis to images from at least one source of weakly annotated images to identify visually similar images; building, by the at least one computing device, a collection of keywords for the target image indicative of image details by extracting the keywords from the visually similar images as weak supervision data used to inform caption generation; supplying, by the at least one computing device, the collection of keywords indicative of the image details to a caption generation model configured to iteratively combine words derived from the concepts and attributes to construct a caption in multiple iterations; and constructing, by the at least one computing device, the caption according to a semantic attention model configured to modulate weights assigned to the keywords for each of the multiple iterations based on relevance to a word predicted in a preceding iteration. 18. The method as described in claim 17 , wherein the semantic attention model causes different keywords to be considered at each of the multiple iterations. 19. The method as described in claim 18 , wherein the caption generation model comprises a recurrent neural network (RNN) designed to implement language modeling and sentence construction techniques for generating the caption for the target image. 20. The method as described in claim 19 , wherein the semantic attention model includes an input attention model applied to input for each node

Assignees

Adobe Systems Inc

Inventors

Classifications

G06V20/70Primary
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06F18/2411
based on the proximity to a decision surface, e.g. support vector machines · CPC title
G06N3/045
Combinations of networks · CPC title

Patent family

Related publications grouped by family.

View patent family 59276195

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9811765B2 cover?: Techniques for image captioning with weak supervision are described herein. In implementations, weak supervision data regarding a target image is obtained and utilized to provide detail information that supplements global image concepts derived for image captioning. Weak supervision data refers to noisy data that is not closely curated and may include errors. Given a target image, weak supervis…
Who is the assignee on this patent?: Adobe Systems Inc
What technology area does this patent fall under?: Primary CPC classification G06V20/70. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Semantic Natural Language Vector Space

Latent embeddings for word images and their semantics

Analyzer for behavioral analysis and parameterization of neural stimulation

Information processing device, information processing method and program

Frequently asked questions