Visual relationship detection method and system based on adaptive clustering learning

US2021192274A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021192274-A1
Application numberUS-202017007213-A
CountryUS
Kind codeA1
Filing dateAug 31, 2020
Priority dateDec 23, 2019
Publication dateJun 24, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure discloses a visual relationship detection method based on adaptive clustering learning, including: detecting visual objects from an input image and recognizing the visual objects to obtain context representation; embedding the context representation of pair-wise visual objects into a low-dimensional joint subspace to obtain a visual relationship sharing representation; embedding the context representation into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representation; and then performing regularization by clustering-driven attention mechanism; fusing the visual relationship sharing representations and regularized visual relationship enhancing representations with a prior distribution over the category label of visual relationship predicate, to predict visual relationship predicates by synthetic relational reasoning. The method is capable of fine-grained recognizing visual relationships of different subclasses by mining latent relationships in-between, which improves the accuracy of visual relationship detection.

First claim

Opening claim text (preview).

1 . A visual relationship detection method based on adaptive clustering learning, comprising, executed by a processor, the following steps: detecting visual objects from an input image and recognizing the visual objects by contextual message passing mechanism to obtain context representations of the visual objects; embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; embedding the context representations of pair-wise visual objects into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning. 2 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the method further comprises: calculating empirical distribution of the visual relationships from training set samples of a visual relationship data set to obtain a visual relationship prior function. 3 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the method further comprises: constructing an initialized visual relationship detection model, and training the model by the training data of the visual relationship data set. 4 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the step of obtaining the visual relationship sharing representations is specifically: obtaining a first product of a joint subject mapping matrix and the context representations of the visual object of the subject, obtaining a second product of a joint object mapping matrix and the context representations of the visual object of the object; subtracting the second product from the first product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region; wherein, the joint subject mapping matrix and the joint object mapping matrix are mapping matrices that map the visual objects context representations to a joint subspace; and the visual relationship candidate region is the minimum rectangle box that can fully cover the corresponding visual object candidate regions of the subject and object; the convolutional features are extracted from the visual relationship candidate region by any convolutional neural network. 5 . The visual relationship detection method based on adaptive clustering learning according to claim 4 , wherein the step of obtaining a plurality of preliminary visual relationship enhancing representation is specifically: obtaining a third product of a k th clustering subject mapping matrix and the context representation of the visual object of the subject, obtaining a fourth product of a k th clustering object mapping matrix and the context representation of the visual object of the object; subtracting the fourth product from the third product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region to obtain a k th preliminary visual relationship enhancing representation; wherein the k th clustering subject mapping matrix and the k th clustering object mapping matrix are mapping matrices that map the visual objects context representation to the k th clustering subspace. 6 . The visual relationship detection method based on adaptive clustering learning according to claim 5 , wherein the step of “performing regularization to the preliminary visual relationship enhancing representations of different subspaces by clustering-driven attention mechanisms” is specifically: obtaining attentive scores of the clustering subspaces; obtaining a sixth product of the k th preliminary visual relationship enhancing representations and the k th regularized mapping matrix, and performing weighted sum operation to the sixth products of different clustering subspaces by using the attentive scores of the clustering subspace as the clustering weight; wherein, the k th regularized mapping matrix is the k th mapping matrix that transforms the preliminary visual relationship enhancing representation. 7 . The visual relationship detection method based on adaptive clustering learning according to claim 6 , wherein the step of “obtaining attentive scores of the clustering subspaces” is specifically: inputting a predicted category label of visual object of subject and a predicted category label of visual object of object into the visual relationship prior function to obtain a prior distribution over the category label of visual relationship predicate; obtaining a fifth product of the prior distribution over the category label of visual relationship predicate and the k th attention mapping matrix, and substituting the fifth product into the soft max function for normalization; wherein, the k th attention mapping matrix is the mapping matrix that transforms the prior distribution over the category label of visual relationship predicate. 8 . The visual relationship detection method based on adaptive clustering learning according to claim 6 , wherein the step of “fusing the visual relationship sharing representations and the regularized visual relationship enhancing representations with a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning” is specifically: inputting a predicted category label of visual object of subject and a predicted category label of visual object of object into the visual relationship prior function to obtain a prior distribution over the category label of visual relationship predicate; and obtaining a seventh product of the visual relationship sharing mapping matrix and the visual relationship sharing representations, obtaining an eighth product of the visual relationship enhancing mapping matrix and the regularized visual relationship enhancing representations; summing the seventh product, the eighth product and the prior distribution over the category label of visual relationship predicate, and then substituting the result into the soft max function. 9 . A system for a visual relationship detection method based on adaptive clustering learning, the system comprising: a processor configured for: detecting visual objects from an input image and recognizing the visual objects by contextual message passing mechanism to obtain context representations of the visual objects; embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; embedding the context representations of pair-wise visual objects into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning. 10 .

Assignees

Inventors

Classifications

  • G06V10/426Primary

    Graphical representations · CPC title

  • using neural networks · CPC title

  • with adaptive number of clusters · CPC title

  • with fixed number of clusters, e.g. K-means clustering · CPC title

  • of input or preprocessed data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021192274A1 cover?
The present disclosure discloses a visual relationship detection method based on adaptive clustering learning, including: detecting visual objects from an input image and recognizing the visual objects to obtain context representation; embedding the context representation of pair-wise visual objects into a low-dimensional joint subspace to obtain a visual relationship sharing representation; em…
Who is the assignee on this patent?
Univ Tianjin
What technology area does this patent fall under?
Primary CPC classification G06V10/426. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 24 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).