Augmented reality (ar) pen/hand tracking
US-2023041294-A1 · Feb 9, 2023 · US
US11748919B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748919-B2 |
| Application number | US-202218002500-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 1, 2022 |
| Priority date | Jul 9, 2021 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of image reconstruction for a cross-modal communication system is disclosed. The method reconstructs a damaged, lost, or delayed image signal during transmission by using complete haptic signals received by a receiving end in the cross-modal communication system, and further constructs a cross-modal interaction network with reference to an attention mechanism, thus solving the limitation of the conventional generation model that it can only be trained on paired samples. An image reconstruction device for a cross-modal communication system is also disclosed. By fully utilizing semantic correlation between different-modality data and realizing cross-modal generation from haptic signals to image signals for unpaired data, the present invention overcomes the difficulty in acquiring haptic-image signal pairs in the practical cross-modal communication system, and significantly improves the quality and class accuracy of generated image signals.
Opening claim text (preview).
What is claimed is: 1. A method of image reconstruction for a cross-modal communication system, comprising the following steps: step 1. selecting haptic signals and image data pairs received by a receiving end of a cross-modal communication system to serve as a training set, wherein each haptic signal in the training set and each image data of the image data pairs has label information about the class it belongs to; step 2. establishing a cross-modal image generation model based on haptic signals, the model comprising an image feature extraction module, an attention mechanism-based cross-modal semantic learning module, and an adversarial image generation module, wherein the image feature extraction module comprises a convolutional neural network (CNN) and a first-class adversarial network, and the image feature extraction module is used for performing feature extraction for the image data in the training set to obtain an image feature; the cross-modal semantic learning module comprises an encoder, an attention mechanism-based semantic fusion network, and a second-class adversarial network, wherein the encoder performs feature extraction for haptic signals in the training set to obtain the corresponding haptic features; then, the haptic features and the image features are together input to the attention mechanism-based semantic fusion network, and the network performs similarity calculation between haptic features having the same label as the image feature and a sigmoid function operation is further performed to obtain weight vectors of the haptic features corresponding to the current image feature, and then, weighted summation is performed for the haptic features based on the weight vectors to obtain a synthetic haptic feature most similar to the current image feature; and afterwards, the second-class adversarial network strengthens the synthetic haptic feature under the effect of adversarial learning to maintain class and distribution characteristics of the haptic signals; and the adversarial image generation module comprises a generative adversarial network, and is used for outputting a generated image having the same label as the strengthened synthetic haptic feature after receiving the synthetic haptic feature; step 3. training the cross-modal image generation model based on haptic signals, wherein an intra-modal loss of the image feature is calculated according to the image feature extraction module, an intra-modal loss of the synthetic haptic feature and an inter-modal loss between the synthetic haptic feature and the image feature are calculated according to the attention mechanism-based cross-modal semantic learning module, and an adversarial generation loss of the generated image is calculated according to the adversarial image generation module and by means of mean square error; these calculated losses are used for updating parameters in the cross-modal image generation model; and after the training converges, an optimal cross-modal image generation model and parameters at this time are saved; and step 4. after completion of the training, inputting the haptic signal received by the receiving end of the cross-modal communication system to the trained cross-modal image generation model to output a target image. 2. The image reconstruction method for a cross-modal communication system according to claim 1 , wherein feature extraction for the image data in step 2 comprises the following steps: (2-1) subjecting image data V to processing by the CNN to obtain an image feature v′ (f) , wherein the CNN comprises a plurality of convolutional layers and a pooling layer is connected after each convolutional layer; (2-2) constructing a first-class adversarial network for v′ (f) , the first-class adversarial network comprising a class label predictor f v (·) with a network parameter O and a class label discriminator D 1 with a network parameter α, wherein f v (·) consists of a plurality of fully connected layers and one softmax layer, and an input of f v (·) is the image feature v′ (f) and an output of f v (·) is a predicted class label v (c) =f v (v′ (f) ; θ v ); the class label discriminator D 1 consists of a plurality of fully connected layers that are sucessively connected and the dimension of the last layer is 1; and D 1 is used for discriminating v (c) and a true label y v corresponding to the image feature v′ (f) ; and by means of adversarial training by f v (·) and D 1 , v′ (f) is updated constantly, and an image feature v (f) ={v i (f) , i=1, 2, . . . , N} that has class characteristic is finally extracted, wherein v i (f) is an image feature of the i-th image data and N is a total image data amount. 3. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein an adversarial loss of the first-class adversarial network is as follows: L cat V ( D 1 )=− E y v [log D 1 ( y v ;α)]− E v (c) [log(1− D 1 ( v (c) ;α))] L cat V ( v (c) )=− E v (c) [log(1− D 1 ( v (c) ;α))] wherein L cat V (D 1 ) is an adversarial loss function for the class label discriminator D 1 ; E y v [*] and E v (c) [*] refer to calculation of an expectation for *; D 1 (y v ; α) indicates a discrimination result of the class label discriminator for a true label y v ; D 1 (v (c) ; α) indicates a discrimination result of the class label discriminator for v (c) output by the class label predictor; and L cat F (v (c) ) an adversarial loss function for the class label predictor f v (·). 4. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein a learning process of the attention mechanism-based cross-modal semantic learning module in step 2 is specifically as follows: (3-1) subjecting a haptic signal to processing by the encoder to obtain a haptic feature h (f) ={h j (f) , j=1,2, . . . , N}, wherein h j (f) is a haptic feature of the j-th haptic signal, N is a total data amount of haptic signals, and the encoder comprises a gated recurrent unit (GRU) and a plurality of fully connected layers; (3-2) matching, by the attention mechanism-based semantic fusion network, the haptic feature and v (f) extracted in step (2-2), wherein with each v i (f) as a query vector, a synthetic haptic feature {tilde over (h)} i (f) belonging to the same class as v i (f) is screened out, wherein {tilde over (h)} i (f) and v i (f) form a haptic-image feature pair, and then a synthetic haptic feature corresponding to v (f) is {tilde over (h)} (f) ={{tilde over (h)} i (f) , i=1,2, . . . , N}, which is specifically as follows: 3-2-1. inputting v i (f) and the haptic feature h (f) to the attention mechanism-based semantic fusion network to output a haptic hidden layer representation vector h (r) ={h j (r) , j=1, 2, . . . , N}, wherein h j (r) is a hidden layer representation vector of the j-th haptic feature h j (f) , the hidden layer is a single-layer perceptron structure, and an activation function is the Tanh ( ) function; and a specific process is as follows: h j (r) =Tanh( wh j (f) +b ) wherein w and b are network parameters of the hidden layer in the attention mechanism-based semantic fusion network; 3-2-2. calculating the Pearson correlation coefficient regarding h j (r) and v i (f) as the similarity: Sim i , j = I i
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Two-dimensional [2D] image generation · CPC title
Combinations of networks · CPC title
Activation functions · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.