Visual relationship detection method and system based on region-aware learning mechanisms

US11301725B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11301725-B2
Application numberUS-202017007245-A
CountryUS
Kind codeB2
Filing dateAug 31, 2020
Priority dateFeb 3, 2020
Publication dateApr 12, 2022
Grant dateApr 12, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention discloses a visual relationship detection method based on a region-aware learning mechanism, comprising: acquiring a triplet graph structure and combining features after its aggregation with neighboring nodes, using the features as nodes in a second graph structure, and connecting in accordance with equiprobable edges to form the second graph structure; combining node features of the second graph structure with features of corresponding entity object nodes in the triplet, using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects, and using the merged region visual features as visual features to be used in the next message propagation by corresponding entity object nodes in the triplet; and after a certain number of times of message propagations, combining the output triplet node features and the node features of the second graph structure to infer predicates between object sets.

First claim

Opening claim text (preview).

What is claimed is: 1. A visual relationship detection method based on a region-aware learning mechanism, executed by a processor, comprising: acquiring a triplet graph structure and combining features after its aggregation with neighboring nodes, using the features as nodes in a second graph structure, and connecting in accordance with equiprobable edges to form the second graph structure; combining node features of the second graph structure with features of corresponding entity object nodes in the triplet, using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects, and using the merged region visual features as visual features to be used in the next message propagation by corresponding entity object nodes in the triplet; and after a certain number of times of message propagations, combining the output triplet node features and the node features of the second graph structure to infer predicates between object sets. 2. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “acquiring a triplet graph structure” specifically, executed by the processor, comprises: using region visual features of the entity objects as features of a set of nodes in the first graph structure, connecting the entity objects in accordance with probabilities of co-occurrence, and gathering feature information of neighboring nodes by a message propagation mechanism to enhance the visual representation of the current node; using, after each message propagation, the output node features as the visual attention mechanism and also as the visual features to be used in the next message propagation by the nodes in the first graph structure; and using the extracted features of each object set and region visual features of the corresponding two entity objects as a set of nodes, and connecting in accordance with the statistical probabilities of visual relationships to form a triplet graph structure. 3. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the first graph structure is specifically as follows: co-occurrence matrixes are used as edges of the first graph structure and region visual features are used as vertices of the first graph structure. 4. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the step of “using, after each message propagation, the output node features as the visual attention mechanism and also as the visual features to be used in the next message propagation by the nodes in the first graph structure” executed by the processor, specifically comprises: combining the enhanced node representation with each region visual feature, to compute an unnormalized relevance score; normalizing the unnormalized relevance score to acquire a weight distribution value of the visual attention mechanism; obtaining the weighted sum of M region features of each entity object by the acquired weight distribution value of the visual attention mechanism, to obtain the merged visual representation; and acquiring the merged visual representation, and performing message propagation by using the merged visual representation as the visual features to be used in the next message propagation by corresponding nodes in the first graph structure. 5. The visual relationship detection method based on a region-aware learning mechanism according to claim 2 , wherein the triplet graph structure is specifically as follows: the statistical probabilities of visual relationships are used as edges of the triplet graph structure; and features of each object set and the region visual features of the corresponding two entity objects are used as vertices of the triplet graph structure. 6. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the second graph structure is specifically executed by the processor, as follows: Acquiring the output features of each triplet graph structure after its aggregation with neighboring nodes, mapping the acquired features to a feature space in a same dimension, and then connected them in the dimension of feature as the nodes in the second graph structure; and fully connecting the nodes in the second graph structure, and edges connecting each node and its neighboring nodes are equiprobable edges. 7. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “using the combined features as a visual attention mechanism and merging internal region visual features extracted by two entity objects”, executed by the processor, specifically comprises: computing an unnormalized relevance score by the combined features and the output each region visual feature; and normalizing the unnormalized relevance score to acquire a weight distribution value of the visual attention mechanism, and obtaining the weighted sum of region features of the corresponding entity object to obtain the merged visual representation. 8. The visual relationship detection method based on a region-aware learning mechanism according to claim 1 , wherein the step of “combining the output triplet node features and the node features of the second graph structure”, executed by the processor, specifically comprises: outputting the nodes of each entity object in the triplet graph structure after T k message propagations, processing with the average pooling strategy and then combining with the visual features of the entity object in the dimension of feature; and outputting the nodes of the object sets in the triplet graph structure after T k message propagations, and connecting with the object set features of an initialized node and the output of each node in the second graph structure in the dimension of feature.

Assignees

Inventors

Classifications

  • of extracted features · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using probabilistic graphical models from image or video features, e.g. Markov models or Bayesian networks · CPC title

  • Validation; Performance evaluation · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11301725B2 cover?
The present invention discloses a visual relationship detection method based on a region-aware learning mechanism, comprising: acquiring a triplet graph structure and combining features after its aggregation with neighboring nodes, using the features as nodes in a second graph structure, and connecting in accordance with equiprobable edges to form the second graph structure; combining node feat…
Who is the assignee on this patent?
Univ Tianjin
What technology area does this patent fall under?
Primary CPC classification G06V10/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 12 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).