Gesture recognition system for TV control
US-9213890-B2 · Dec 15, 2015 · US
US2021192274A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021192274-A1 |
| Application number | US-202017007213-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 31, 2020 |
| Priority date | Dec 23, 2019 |
| Publication date | Jun 24, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure discloses a visual relationship detection method based on adaptive clustering learning, including: detecting visual objects from an input image and recognizing the visual objects to obtain context representation; embedding the context representation of pair-wise visual objects into a low-dimensional joint subspace to obtain a visual relationship sharing representation; embedding the context representation into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representation; and then performing regularization by clustering-driven attention mechanism; fusing the visual relationship sharing representations and regularized visual relationship enhancing representations with a prior distribution over the category label of visual relationship predicate, to predict visual relationship predicates by synthetic relational reasoning. The method is capable of fine-grained recognizing visual relationships of different subclasses by mining latent relationships in-between, which improves the accuracy of visual relationship detection.
Opening claim text (preview).
1 . A visual relationship detection method based on adaptive clustering learning, comprising, executed by a processor, the following steps: detecting visual objects from an input image and recognizing the visual objects by contextual message passing mechanism to obtain context representations of the visual objects; embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; embedding the context representations of pair-wise visual objects into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning. 2 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the method further comprises: calculating empirical distribution of the visual relationships from training set samples of a visual relationship data set to obtain a visual relationship prior function. 3 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the method further comprises: constructing an initialized visual relationship detection model, and training the model by the training data of the visual relationship data set. 4 . The visual relationship detection method based on adaptive clustering learning according to claim 1 , wherein the step of obtaining the visual relationship sharing representations is specifically: obtaining a first product of a joint subject mapping matrix and the context representations of the visual object of the subject, obtaining a second product of a joint object mapping matrix and the context representations of the visual object of the object; subtracting the second product from the first product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region; wherein, the joint subject mapping matrix and the joint object mapping matrix are mapping matrices that map the visual objects context representations to a joint subspace; and the visual relationship candidate region is the minimum rectangle box that can fully cover the corresponding visual object candidate regions of the subject and object; the convolutional features are extracted from the visual relationship candidate region by any convolutional neural network. 5 . The visual relationship detection method based on adaptive clustering learning according to claim 4 , wherein the step of obtaining a plurality of preliminary visual relationship enhancing representation is specifically: obtaining a third product of a k th clustering subject mapping matrix and the context representation of the visual object of the subject, obtaining a fourth product of a k th clustering object mapping matrix and the context representation of the visual object of the object; subtracting the fourth product from the third product, and dot-multiplying the difference value and convolutional features of a visual relationship candidate region to obtain a k th preliminary visual relationship enhancing representation; wherein the k th clustering subject mapping matrix and the k th clustering object mapping matrix are mapping matrices that map the visual objects context representation to the k th clustering subspace. 6 . The visual relationship detection method based on adaptive clustering learning according to claim 5 , wherein the step of “performing regularization to the preliminary visual relationship enhancing representations of different subspaces by clustering-driven attention mechanisms” is specifically: obtaining attentive scores of the clustering subspaces; obtaining a sixth product of the k th preliminary visual relationship enhancing representations and the k th regularized mapping matrix, and performing weighted sum operation to the sixth products of different clustering subspaces by using the attentive scores of the clustering subspace as the clustering weight; wherein, the k th regularized mapping matrix is the k th mapping matrix that transforms the preliminary visual relationship enhancing representation. 7 . The visual relationship detection method based on adaptive clustering learning according to claim 6 , wherein the step of “obtaining attentive scores of the clustering subspaces” is specifically: inputting a predicted category label of visual object of subject and a predicted category label of visual object of object into the visual relationship prior function to obtain a prior distribution over the category label of visual relationship predicate; obtaining a fifth product of the prior distribution over the category label of visual relationship predicate and the k th attention mapping matrix, and substituting the fifth product into the soft max function for normalization; wherein, the k th attention mapping matrix is the mapping matrix that transforms the prior distribution over the category label of visual relationship predicate. 8 . The visual relationship detection method based on adaptive clustering learning according to claim 6 , wherein the step of “fusing the visual relationship sharing representations and the regularized visual relationship enhancing representations with a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning” is specifically: inputting a predicted category label of visual object of subject and a predicted category label of visual object of object into the visual relationship prior function to obtain a prior distribution over the category label of visual relationship predicate; and obtaining a seventh product of the visual relationship sharing mapping matrix and the visual relationship sharing representations, obtaining an eighth product of the visual relationship enhancing mapping matrix and the regularized visual relationship enhancing representations; summing the seventh product, the eighth product and the prior distribution over the category label of visual relationship predicate, and then substituting the result into the soft max function. 9 . A system for a visual relationship detection method based on adaptive clustering learning, the system comprising: a processor configured for: detecting visual objects from an input image and recognizing the visual objects by contextual message passing mechanism to obtain context representations of the visual objects; embedding the context representations of pair-wise visual objects into a low-dimensional joint subspace to obtain visual relationship sharing representations; embedding the context representations of pair-wise visual objects into a plurality of low-dimensional clustering subspaces, respectively, to obtain a plurality of preliminary visual relationship enhancing representations; and then performing regularization to the preliminary visual relationship enhancing representations by clustering-driven attention mechanisms; and fusing the visual relationship sharing representations, the regularized visual relationship enhancing representations and a prior distribution over the category labels of visual relationship predicates, to predict visual relationship predicates by synthetic relational reasoning. 10 .
Graphical representations · CPC title
using neural networks · CPC title
with adaptive number of clusters · CPC title
with fixed number of clusters, e.g. K-means clustering · CPC title
of input or preprocessed data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.