Modeling semantic concepts in an embedding space as distributions

US11238362B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11238362-B2
Application numberUS-201614996959-A
CountryUS
Kind codeB2
Filing dateJan 15, 2016
Priority dateJan 15, 2016
Publication dateFeb 1, 2022
Grant dateFeb 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Modeling semantic concepts in an embedding space as distributions is described. In the embedding space, both images and text labels are represented. The text labels describe semantic concepts that are exhibited in image content. In the embedding space, the semantic concepts described by the text labels are modeled as distributions. By using distributions, each semantic concept is modeled as a continuous cluster which can overlap other clusters that model other semantic concepts. For example, a distribution for the semantic concept “apple” can overlap distributions for the semantic concepts “fruit” and “tree” since can refer to both a fruit and a tree. In contrast to using distributions, conventionally configured visual-semantic embedding spaces represent a semantic concept as a single point. Thus, unlike these conventionally configured embedding spaces, the embedding spaces described herein are generated to model semantic concepts as distributions, such as Gaussian distributions, Gaussian mixtures, and so on.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by a computing device to annotate images with determined text labels to describe content of the images, the method comprising: generating an embedding space representing both images and text labels of a text vocabulary, including: computing distributions representing semantic concepts in the embedding space rather than representing the semantic concepts as single points, the semantic concepts for which the distributions are computed being described by respective text labels of the text vocabulary and capable of being depicted in image content; determining semantic relationships between meanings of the text labels of the text vocabulary; positioning the distributions in the embedding space based on the semantic relationships determined for the respective text labels; and mapping representative images to the distributions of the embedding space, wherein the image content depicted by the representative images exemplifies corresponding semantic concepts of the distributions; determining a set of semantically meaningful image regions of a query image, the set of semantically meaningful image regions of the query image being mappable to the text labels in the embedding space; processing the set of semantically meaningful image regions of the query image to discard semantically meaningful image regions of the query image that fail to meet at least one predefined criterion and to obtain a subset of the semantically meaningful image regions of the query image that meet the at least one predefined criterion; determining, using the embedding space, at least one of the text labels of the embedding space describing at least one depicted semantic concept in criteria-meeting image regions of the query image; and annotating the query image by associating the determined text labels with the query image. 2. A method as described in claim 1 , wherein the distributions are computed as Gaussian distributions representing the semantic concepts. 3. A method as described in claim 1 , wherein the distributions are computed as Gaussian mixtures representing the semantic concepts. 4. A method as described in claim 1 , wherein generating the embedding space further includes: processing a plurality of training images, each training image having multiple text labels, said processing including generating sets of image regions that correspond to respective labels of the multiple text labels; and setting the sets of image regions as the representative images for the mapping to the distributions of the embedding space. 5. A method as described in claim 4 , wherein processing the plurality of training images includes, for each training image: determining candidate image regions for a respective set of image regions of the training image; and reducing a number of the determined candidate image regions using at least one post-processing technique. 6. A method as described in claim 5 , wherein the candidate image regions are determined using geodesic object proposal. 7. A method as described in claim 5 , wherein the at least one post-processing technique involves enforcing a size criterion by discarding candidate image regions having less than a threshold size. 8. A method as described in claim 5 , wherein the at least one post-processing technique involves enforcing an aspect ratio criterion by discarding candidate image regions having aspect ratios outside predefined allowable aspect ratios. 9. A method as described in claim 5 , wherein the at least one post-processing technique includes assigning a single candidate image region to each respective label of the multiple text labels of the training image based on a single-label embedding model. 10. A method as described in claim 1 , wherein determining the at least one text label includes computing distances in the embedding space between embeddings of the semantically meaningful image regions of the query image and the distributions. 11. A method as described in claim 10 , wherein the distances are computed using vectors that represent respective semantically meaningful image regions of the query image, the vectors extracted from the semantically meaningful image regions of the query image with a Convolutional Neural Network (CNN). 12. A method as described in claim 10 , further comprising selecting the at least one text label for association with the query image based on the distances. 13. A method as described in claim 1 , further comprising presenting indications of the criteria-meeting image regions of the query image that correspond to the at least one text label. 14. A method as described in claim 1 , wherein the query image is annotated in conjunction with indexing the query image for search. 15. A method as described in claim 1 , further comprising presenting the criteria-meeting image regions of the query image, the presented criteria-meeting image regions of the query image changed visually to appear different from other portions of the query image. 16. A system to annotate images with determined text labels to describe content of the image, the system comprising: one or more processors; and computer-readable storage media having stored thereon instructions that are executable by the one or more processors to perform operations comprising: processing a training image having multiple text labels, said processing including generating a set of image regions that correspond to respective labels of the multiple text labels; embedding the set of image regions within an embedding space representing semantic concepts as distributions rather than representing the semantic concepts as single points, the semantic concepts represented being described by text labels of a text vocabulary and capable of being depicted in image content, the set of image regions embedded with distributions representing the semantic concepts depicted in the image content of the set of image regions, the distributions in the embedding space positioned based on semantic relationships determined for the text labels, and determination of the semantic relationships being based on meanings of the text labels of the text vocabulary; determining a set of semantically meaningful image regions of a query image, the set of semantically meaningful image regions of the query image being mappable to the text labels in the embedding space; processing the set of semantically meaningful image regions of the query image to discard the semantically meaningful image regions of the query image that fail to meet at least one predefined criterion and to obtain a subset of semantically meaningful image regions of the query image that meet the at least one predefined criterion; determining the text labels that describe depicted semantic concepts of the query image by mapping criteria-meeting image regions of the query image to the distributions of the embedding space; and annotating the query image with at least two of the determined text labels. 17. A system as described in claim 16 , further comprising presenting the image regions of the query image that correspond to the at least two determined text labels. 18. A system as described in claim 16 , wherein the query image is annotated in conjunction with indexing the query image for search. 19. One or more computer-readable storage media comprising instructions stored thereon that, responsive to execution by a computing device, perform operations comprising: generating an embedding space representing both images and text labels of a text vocabulary, including

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Combinations of networks · CPC title

  • Smoothing the distance, e.g. radial basis function networks [RBFN] · CPC title

  • using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11238362B2 cover?
Modeling semantic concepts in an embedding space as distributions is described. In the embedding space, both images and text labels are represented. The text labels describe semantic concepts that are exhibited in image content. In the embedding space, the semantic concepts described by the text labels are modeled as distributions. By using distributions, each semantic concept is modeled as a c…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).