Topic association and tagging for dense images

US10496699B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10496699-B2
Application numberUS-201715463757-A
CountryUS
Kind codeB2
Filing dateMar 20, 2017
Priority dateMar 20, 2017
Publication dateDec 3, 2019
Grant dateDec 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A framework is provided for associating dense images with topics. The framework is trained utilizing images, each having multiple regions, multiple visual characteristics and multiple keyword tags associated therewith. For each region of each image, visual features are computed from the visual characteristics utilizing a convolutional neural network, and an image feature vector is generated from the visual features. The keyword tags are utilized to generate a weighted word vector for each image by calculating a weighted average of word vector representations representing keyword tags associated with the image. The image feature vector and the weighted word vector are aligned in a common embedding space and a heat map is computed for the image. Once trained, the framework can be utilized to automatically tag images and rank the relevance of images with respect to queried keywords based upon associated heat maps.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to: receive a plurality of images, each image of the plurality of images being associated with a plurality of tags and each image of the plurality of images being comprised of a plurality of regions, each region of each image comprising less than an entirety of the image it comprises; for each region of each image of the plurality of images, generate an image feature vector from one or more visual features; for each image of the plurality of images, generate a weighted word vector from the associated plurality of tags; for each image, compute a heat map corresponding thereto by aligning the image feature vector for each region of a given image and the weighted word feature vector into a common embedding space utilizing cosine similarity loss, wherein a plurality of regions of the heat map corresponds to the plurality of regions of the given image and wherein at least one region of the plurality of regions of the heat map corresponds to each of the plurality of tags; and provide an image of the plurality of images to be presented based on the computed heat map for the image. 2. The computing system of claim 1 , wherein for each image of the plurality of images, the one or more processors are further caused to compute the one or more visual features. 3. The computing system of claim 2 , wherein the one or more visual features are computed utilizing a convolutional neural network. 4. The computing system of claim 1 , wherein for each image of the plurality of images, the one or more processor are further caused to: generate a word vector representation for each of the associated plurality of tags; calculate a weighted average of the generated word vector representations to generate the weighted word vector; and normalize the weighted word vector in the common embedding space. 5. The computing system of claim 4 , wherein the word vector representation for each of the associated plurality of tags is generated utilizing Pointwise Mutual Information. 6. The computing system of claim 4 , wherein the weighted average is calculated, at least in part, uniformly for each of the plurality of tags. 7. The computing system of claim 4 , wherein the weighted average is calculated, at least in part, utilizing inverse document frequency. 8. The computing system of claim 4 , wherein each of the associated plurality of tags is a user-provided tag, and wherein the weighted average is calculated, at least in part, utilizing a tag order in which a first tag of the plurality of associated tags is assigned a greater weight than a second tag of the plurality of associated tags when the first tag is provided by the user before the second tag. 9. The computing system of claim 1 , wherein the image feature vectors for each of the plurality of regions collectively comprise a vector map, and wherein the embedded image feature vectors for each of the plurality of regions collectively comprise an image embedding map. 10. The computing system of claim 9 , wherein the one or more processors further are caused to: pool a local feature vector from the vector map; concatenate the local feature vector with an embedded image feature vector for one of the plurality of regions to form a topic-guided feature vector; generate a second image embedding vector from the topic-guided feature vector; and compute a second heat map corresponding to the image by aligning the second image embedding vector with the weighted word vector into the common embedding space utilizing cosine similarity loss. 11. A computer-implemented method for tagging images, the method comprising: receiving an image associated with a plurality of user-provided tags, the image being comprised of a plurality of regions, each region comprising less than the entire image; generating an embedded image feature vector for each of the plurality of regions; generating an image-specific weighted word vector from the plurality of user-provided tags; computing a first heat map corresponding to the image by aligning the embedded image feature vector for each region of the image and the image-specific weighted word vector into a common embedding space using cosine similarity loss, wherein a plurality of regions of the first heat map corresponds to the plurality of regions of the image, and wherein at least one region of the first heat map corresponds to each of the plurality of user-provided tags; and providing the image to be presented based on the computed first heat map. 12. The method of claim 11 , wherein generating the embedded image feature vector for each of the plurality of regions comprises: computing one or more visual features associated each of the plurality of regions; generating an image feature vector associated with each of the plurality of regions from the associated one or more visual features; and generating the embedded image feature vector for each of the plurality of regions from the associated image feature vector utilizing a convolutional neural network. 13. The method of claim 12 , wherein the image feature vectors for each of the plurality of regions collectively comprise a vector map and wherein the image embedding vectors for each of the plurality of regions collectively comprise an image embedding map. 14. The method of claim 13 , further comprising: pooling a local feature vector from a particular region of the first heat map utilizing the vector map, the particular region of the first heat map being associated with one of the plurality of user-provided tags; concatenating the local feature vector with an image embedding vector for one of the plurality of regions to form a topic-guided feature vector; generating a second image embedding vector from the topic-guided feature vector; and computing a second heat map corresponding to the image by aligning the second image embedding vector with a soft topic feature vector into the common embedding space utilizing cosine similarity loss. 15. The method of claim 11 , wherein generating the image-specific weighted word vector from the plurality of user-provided tags comprises: generating a word vector representation for each of the associated plurality of user-provided tags; and calculating a weighted average of the generated word vector representations to generate the image-specific weighted word vector. 16. The method of claim 15 , wherein the word vector representation for each of the associated plurality of user-provided tags is generated utilizing Pointwise Mutual Information. 17. The method of claim 15 , wherein the weighted average is calculated, at least in part, utilizing inverse document frequency. 18. The method of claim 15 , wherein the weighted average is calculated, at least in part, utilizing a tag order in which a first tag of the plurality of user-provided tags is assigned a greater weight than a second tag of the plurality of user-provided tags when the first tag is provided by a user before the second tag. 19. A computing system comprising: means for generating an image embedding vector for each of the plurality of regions of an image utilizing a convolutional neural network; means for generating a soft topic feature vector for the image by calculating a weighted average of a plurality of word vector representations, each of the plurality of word vector representations be

Assignees

Inventors

Classifications

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Combinations of networks · CPC title

  • nonlinear criteria, e.g. embedding a manifold in a Euclidean space · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10496699B2 cover?
A framework is provided for associating dense images with topics. The framework is trained utilizing images, each having multiple regions, multiple visual characteristics and multiple keyword tags associated therewith. For each region of each image, visual features are computed from the visual characteristics utilizing a convolutional neural network, and an image feature vector is generated fro…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/583. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).