Generating scene graphs from digital images using external knowledge and image reconstruction

US11373390B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11373390-B2
Application numberUS-201916448473-A
CountryUS
Kind codeB2
Filing dateJun 21, 2019
Priority dateJun 21, 2019
Publication dateJun 28, 2022
Grant dateJun 28, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computing device to: determine, using object recognition processes, a plurality of object proposals for objects in a digital image; determine, for the plurality of object proposals, a set of subgraph proposals indicating candidate object relationships involving pairs of different object proposals of the plurality of object proposals within the digital image, each subgraph proposal of the set of subgraph proposals comprising a candidate object relationship involving two object proposals from the plurality of object proposals in the digital image; refine features of the plurality of object proposals and features of the set of subgraph proposals using extracted relationships corresponding to the plurality of object proposals and the set of subgraph proposals by accessing an external knowledgebase comprising a plurality of semantic relationships involving objects to obtain the extracted relationships for the plurality of object proposals and the set of subgraph proposals; and generate a semantic scene graph for the digital image by predicting object labels and predicate labels based on the refined features of the plurality of object proposals and the refined features of the set of subgraph proposals. 2. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions executed by the at least one processor, cause the computing device to: determine feature vectors representing the features of the plurality of object proposals and feature maps representing the features of the set of subgraph proposals; and perform an initial refinement of the feature vectors of the plurality of object proposals relative to the set of subgraph proposals and an initial refinement of the feature maps of the set of subgraph proposals relative to the plurality of object proposals using a multi-class neural network layer. 3. The non-transitory computer readable storage medium as recited in claim 2 , wherein the instructions that cause the computing device to refine the features of the plurality of object proposals and the features of the set of subgraph proposals cause the computing device to: determine a predetermined number of relationships that occur most frequently in the external knowledgebase for an identified object proposal of the plurality of object proposals; and encode, using a recurrent neural network, word embeddings comprising the predetermined number of relationships with the identified object proposal. 4. The non-transitory computer readable storage medium as recited in claim 3 , wherein the instructions that cause the computing device to refine the features of the plurality of object proposals and the features of the set of subgraph proposals cause the computing device to jointly refine the feature vectors and the feature maps utilizing episodic memory states in a dynamic memory network. 5. The non-transitory computer readable storage medium as recited in claim 1 , wherein each object proposal of the plurality of object proposals is associated with a subset of subgraph proposals of the set of subgraph proposals. 6. The non-transitory computer readable storage medium as recited in claim 1 , wherein the instructions that cause the computing device to determine the set of subgraph proposals cause the computing device to: determine a first score for a first object proposal and a second score for a second object proposal in an identified pair of object proposals; and determine a subgraph proposal for the identified pair of object proposals by determining a union box with a confidence score as a product of the first score and the second score. 7. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: generate a synthesized image based on the object labels used to generate the semantic scene graph; determine a difference between the digital image and the synthesized image; and modify, using the determined difference, one or more parameters of a scene graph generation model that generates the semantic scene graph. 8. The non-transitory computer readable storage medium as recited in claim 7 , wherein the instructions that cause the computing device to generate the synthesized image cause the computing device to generate, using a cascaded refinement network, the synthesized image comprising objects from a scene layout based on the object labels and locations associated with the object labels. 9. The non-transitory computer readable storage medium as recited in claim 7 , wherein the instructions that cause the computing device to determine the difference between the digital image and the synthesized image cause the computing device to determine, using a generative adversarial network, a loss associated with the synthesized image relative to the digital image. 10. The non-transitory computer readable storage medium as recited in claim 9 , wherein the instructions that cause the computing device to modify the one or more parameters of the scene graph generation model that generates the semantic scene graph cause the computing device to modify, based on the determined loss and using backpropagation, one or more parameters of an object detection model used to determine the plurality of object proposals for objects in the digital image. 11. In a digital medium environment, a method of generating accurate scene graph representations of digital images, the method comprising: determine, using object recognition processes, a plurality of object proposals for objects in a digital image and a set of subgraph proposals indicating candidate object relationships involving each pair of different object proposals of the plurality of object proposals within the digital image, each subgraph proposal of the set of subgraph proposals comprising a candidate object relationship involving two object proposals from the plurality of object proposals in the digital image; refining features of the plurality of object proposals and features of the set of subgraph proposals by accessing an external knowledgebase comprising a plurality of semantic relationships involving objects to extract relationships for the plurality of object proposals and the set of subgraph proposals; and generating, using a scene graph generation model, a semantic scene graph for the digital image by predicting object labels and predicate labels based on the refined features of the plurality of object proposals and the refined features of the set of subgraph proposals. 12. The method as recited in claim 11 , further comprising: generating a synthesized image based on the semantic scene graph; determining a loss between the digital image and the synthesized image; and modifying, using the determined loss, one or more parameters of a scene graph generation model that generates the semantic scene graph. 13. In a digital medium environment, a system for generating accurate scene graph representations of digital images, the system comprising: at least one processor; and a non-transitory computer memory comprising instructions that, when executed by the at least one processor, cause the system to: determine, using object recognition processes, a plurality of object proposals for objects in a digital image by estimating objects and bounding boxes for the estimated objects within the digital image; determine, for pairs of different object proposals of the plurality of object proposals, a set of subgrap

Assignees

Inventors

Classifications

  • G06N5/022Primary

    Knowledge engineering; Knowledge acquisition · CPC title

  • Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Tree-organised classifiers · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11373390B2 cover?
Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06N5/022. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 28 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).