Electronic device and method for reliability-based object recognition
US-2019139256-A1 · May 9, 2019 · US
US11373390B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11373390-B2 |
| Application number | US-201916448473-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2019 |
| Priority date | Jun 21, 2019 |
| Publication date | Jun 28, 2022 |
| Grant date | Jun 28, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and non-transitory computer readable storage media are disclosed for generating semantic scene graphs for digital images using an external knowledgebase for feature refinement. For example, the disclosed system can determine object proposals and subgraph proposals for a digital image to indicate candidate relationships between objects in the digital image. The disclosed system can then extract relationships from an external knowledgebase for refining features of the object proposals and the subgraph proposals. Additionally, the disclosed system can generate a semantic scene graph for the digital image based on the refined features of the object/subgraph proposals. Furthermore, the disclosed system can update/train a semantic scene graph generation network based on the generated semantic scene graph. The disclosed system can also reconstruct the image using object labels based on the refined features to further update/train the semantic scene graph generation network.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer readable storage medium comprising instructions that, when executed by at least one processor, cause a computing device to: determine, using object recognition processes, a plurality of object proposals for objects in a digital image; determine, for the plurality of object proposals, a set of subgraph proposals indicating candidate object relationships involving pairs of different object proposals of the plurality of object proposals within the digital image, each subgraph proposal of the set of subgraph proposals comprising a candidate object relationship involving two object proposals from the plurality of object proposals in the digital image; refine features of the plurality of object proposals and features of the set of subgraph proposals using extracted relationships corresponding to the plurality of object proposals and the set of subgraph proposals by accessing an external knowledgebase comprising a plurality of semantic relationships involving objects to obtain the extracted relationships for the plurality of object proposals and the set of subgraph proposals; and generate a semantic scene graph for the digital image by predicting object labels and predicate labels based on the refined features of the plurality of object proposals and the refined features of the set of subgraph proposals. 2. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions executed by the at least one processor, cause the computing device to: determine feature vectors representing the features of the plurality of object proposals and feature maps representing the features of the set of subgraph proposals; and perform an initial refinement of the feature vectors of the plurality of object proposals relative to the set of subgraph proposals and an initial refinement of the feature maps of the set of subgraph proposals relative to the plurality of object proposals using a multi-class neural network layer. 3. The non-transitory computer readable storage medium as recited in claim 2 , wherein the instructions that cause the computing device to refine the features of the plurality of object proposals and the features of the set of subgraph proposals cause the computing device to: determine a predetermined number of relationships that occur most frequently in the external knowledgebase for an identified object proposal of the plurality of object proposals; and encode, using a recurrent neural network, word embeddings comprising the predetermined number of relationships with the identified object proposal. 4. The non-transitory computer readable storage medium as recited in claim 3 , wherein the instructions that cause the computing device to refine the features of the plurality of object proposals and the features of the set of subgraph proposals cause the computing device to jointly refine the feature vectors and the feature maps utilizing episodic memory states in a dynamic memory network. 5. The non-transitory computer readable storage medium as recited in claim 1 , wherein each object proposal of the plurality of object proposals is associated with a subset of subgraph proposals of the set of subgraph proposals. 6. The non-transitory computer readable storage medium as recited in claim 1 , wherein the instructions that cause the computing device to determine the set of subgraph proposals cause the computing device to: determine a first score for a first object proposal and a second score for a second object proposal in an identified pair of object proposals; and determine a subgraph proposal for the identified pair of object proposals by determining a union box with a confidence score as a product of the first score and the second score. 7. The non-transitory computer readable storage medium as recited in claim 1 , further comprising instructions that, when executed by the at least one processor, cause the computing device to: generate a synthesized image based on the object labels used to generate the semantic scene graph; determine a difference between the digital image and the synthesized image; and modify, using the determined difference, one or more parameters of a scene graph generation model that generates the semantic scene graph. 8. The non-transitory computer readable storage medium as recited in claim 7 , wherein the instructions that cause the computing device to generate the synthesized image cause the computing device to generate, using a cascaded refinement network, the synthesized image comprising objects from a scene layout based on the object labels and locations associated with the object labels. 9. The non-transitory computer readable storage medium as recited in claim 7 , wherein the instructions that cause the computing device to determine the difference between the digital image and the synthesized image cause the computing device to determine, using a generative adversarial network, a loss associated with the synthesized image relative to the digital image. 10. The non-transitory computer readable storage medium as recited in claim 9 , wherein the instructions that cause the computing device to modify the one or more parameters of the scene graph generation model that generates the semantic scene graph cause the computing device to modify, based on the determined loss and using backpropagation, one or more parameters of an object detection model used to determine the plurality of object proposals for objects in the digital image. 11. In a digital medium environment, a method of generating accurate scene graph representations of digital images, the method comprising: determine, using object recognition processes, a plurality of object proposals for objects in a digital image and a set of subgraph proposals indicating candidate object relationships involving each pair of different object proposals of the plurality of object proposals within the digital image, each subgraph proposal of the set of subgraph proposals comprising a candidate object relationship involving two object proposals from the plurality of object proposals in the digital image; refining features of the plurality of object proposals and features of the set of subgraph proposals by accessing an external knowledgebase comprising a plurality of semantic relationships involving objects to extract relationships for the plurality of object proposals and the set of subgraph proposals; and generating, using a scene graph generation model, a semantic scene graph for the digital image by predicting object labels and predicate labels based on the refined features of the plurality of object proposals and the refined features of the set of subgraph proposals. 12. The method as recited in claim 11 , further comprising: generating a synthesized image based on the semantic scene graph; determining a loss between the digital image and the synthesized image; and modifying, using the determined loss, one or more parameters of a scene graph generation model that generates the semantic scene graph. 13. In a digital medium environment, a system for generating accurate scene graph representations of digital images, the system comprising: at least one processor; and a non-transitory computer memory comprising instructions that, when executed by the at least one processor, cause the system to: determine, using object recognition processes, a plurality of object proposals for objects in a digital image by estimating objects and bounding boxes for the estimated objects within the digital image; determine, for pairs of different object proposals of the plurality of object proposals, a set of subgrap
Knowledge engineering; Knowledge acquisition · CPC title
Detecting or recognising potential candidate objects based on visual cues, e.g. shapes · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Tree-organised classifiers · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.