Who is the assignee on this patent?

Univ Nanjing Posts & Telecommunications

What technology area does this patent fall under?

Primary CPC classification G06T11/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method of image reconstruction for cross-modal communication system and device thereof

US11748919B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11748919-B2
Application number	US-202218002500-A
Country	US
Kind code	B2
Filing date	Jul 1, 2022
Priority date	Jul 9, 2021
Publication date	Sep 5, 2023
Grant date	Sep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of image reconstruction for a cross-modal communication system is disclosed. The method reconstructs a damaged, lost, or delayed image signal during transmission by using complete haptic signals received by a receiving end in the cross-modal communication system, and further constructs a cross-modal interaction network with reference to an attention mechanism, thus solving the limitation of the conventional generation model that it can only be trained on paired samples. An image reconstruction device for a cross-modal communication system is also disclosed. By fully utilizing semantic correlation between different-modality data and realizing cross-modal generation from haptic signals to image signals for unpaired data, the present invention overcomes the difficulty in acquiring haptic-image signal pairs in the practical cross-modal communication system, and significantly improves the quality and class accuracy of generated image signals.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of image reconstruction for a cross-modal communication system, comprising the following steps: step 1. selecting haptic signals and image data pairs received by a receiving end of a cross-modal communication system to serve as a training set, wherein each haptic signal in the training set and each image data of the image data pairs has label information about the class it belongs to; step 2. establishing a cross-modal image generation model based on haptic signals, the model comprising an image feature extraction module, an attention mechanism-based cross-modal semantic learning module, and an adversarial image generation module, wherein the image feature extraction module comprises a convolutional neural network (CNN) and a first-class adversarial network, and the image feature extraction module is used for performing feature extraction for the image data in the training set to obtain an image feature; the cross-modal semantic learning module comprises an encoder, an attention mechanism-based semantic fusion network, and a second-class adversarial network, wherein the encoder performs feature extraction for haptic signals in the training set to obtain the corresponding haptic features; then, the haptic features and the image features are together input to the attention mechanism-based semantic fusion network, and the network performs similarity calculation between haptic features having the same label as the image feature and a sigmoid function operation is further performed to obtain weight vectors of the haptic features corresponding to the current image feature, and then, weighted summation is performed for the haptic features based on the weight vectors to obtain a synthetic haptic feature most similar to the current image feature; and afterwards, the second-class adversarial network strengthens the synthetic haptic feature under the effect of adversarial learning to maintain class and distribution characteristics of the haptic signals; and the adversarial image generation module comprises a generative adversarial network, and is used for outputting a generated image having the same label as the strengthened synthetic haptic feature after receiving the synthetic haptic feature; step 3. training the cross-modal image generation model based on haptic signals, wherein an intra-modal loss of the image feature is calculated according to the image feature extraction module, an intra-modal loss of the synthetic haptic feature and an inter-modal loss between the synthetic haptic feature and the image feature are calculated according to the attention mechanism-based cross-modal semantic learning module, and an adversarial generation loss of the generated image is calculated according to the adversarial image generation module and by means of mean square error; these calculated losses are used for updating parameters in the cross-modal image generation model; and after the training converges, an optimal cross-modal image generation model and parameters at this time are saved; and step 4. after completion of the training, inputting the haptic signal received by the receiving end of the cross-modal communication system to the trained cross-modal image generation model to output a target image. 2. The image reconstruction method for a cross-modal communication system according to claim 1 , wherein feature extraction for the image data in step 2 comprises the following steps: (2-1) subjecting image data V to processing by the CNN to obtain an image feature v′ (f) , wherein the CNN comprises a plurality of convolutional layers and a pooling layer is connected after each convolutional layer; (2-2) constructing a first-class adversarial network for v′ (f) , the first-class adversarial network comprising a class label predictor f v (·) with a network parameter O and a class label discriminator D 1 with a network parameter α, wherein f v (·) consists of a plurality of fully connected layers and one softmax layer, and an input of f v (·) is the image feature v′ (f) and an output of f v (·) is a predicted class label v (c) =f v (v′ (f) ; θ v ); the class label discriminator D 1 consists of a plurality of fully connected layers that are sucessively connected and the dimension of the last layer is 1; and D 1 is used for discriminating v (c) and a true label y v corresponding to the image feature v′ (f) ; and by means of adversarial training by f v (·) and D 1 , v′ (f) is updated constantly, and an image feature v (f) ={v i (f) , i=1, 2, . . . , N} that has class characteristic is finally extracted, wherein v i (f) is an image feature of the i-th image data and N is a total image data amount. 3. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein an adversarial loss of the first-class adversarial network is as follows: L cat V ( D 1 )=− E y v [log D 1 ( y v ;α)]− E v (c) [log(1− D 1 ( v (c) ;α))] L cat V ( v (c) )=− E v (c) [log(1− D 1 ( v (c) ;α))] wherein L cat V (D 1 ) is an adversarial loss function for the class label discriminator D 1 ; E y v [*] and E v (c) [*] refer to calculation of an expectation for *; D 1 (y v ; α) indicates a discrimination result of the class label discriminator for a true label y v ; D 1 (v (c) ; α) indicates a discrimination result of the class label discriminator for v (c) output by the class label predictor; and L cat F (v (c) ) an adversarial loss function for the class label predictor f v (·). 4. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein a learning process of the attention mechanism-based cross-modal semantic learning module in step 2 is specifically as follows: (3-1) subjecting a haptic signal to processing by the encoder to obtain a haptic feature h (f) ={h j (f) , j=1,2, . . . , N}, wherein h j (f) is a haptic feature of the j-th haptic signal, N is a total data amount of haptic signals, and the encoder comprises a gated recurrent unit (GRU) and a plurality of fully connected layers; (3-2) matching, by the attention mechanism-based semantic fusion network, the haptic feature and v (f) extracted in step (2-2), wherein with each v i (f) as a query vector, a synthetic haptic feature {tilde over (h)} i (f) belonging to the same class as v i (f) is screened out, wherein {tilde over (h)} i (f) and v i (f) form a haptic-image feature pair, and then a synthetic haptic feature corresponding to v (f) is {tilde over (h)} (f) ={{tilde over (h)} i (f) , i=1,2, . . . , N}, which is specifically as follows: 3-2-1. inputting v i (f) and the haptic feature h (f) to the attention mechanism-based semantic fusion network to output a haptic hidden layer representation vector h (r) ={h j (r) , j=1, 2, . . . , N}, wherein h j (r) is a hidden layer representation vector of the j-th haptic feature h j (f) , the hidden layer is a single-layer perceptron structure, and an activation function is the Tanh ( ) function; and a specific process is as follows: h j (r) =Tanh( wh j (f) +b ) wherein w and b are network parameters of the hidden layer in the attention mechanism-based semantic fusion network; 3-2-2. calculating the Pearson correlation coefficient regarding h j (r) and v i (f) as the similarity: Sim i , j = I i

Assignees

Univ Nanjing Posts & Telecommunications

Inventors

Classifications

G06T11/10Primary
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
G06T11/00Primary
Two-dimensional [2D] image generation · CPC title
G06N3/045Primary
Combinations of networks · CPC title
G06N3/048
Activation functions · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 78379439

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748919B2 cover?: A method of image reconstruction for a cross-modal communication system is disclosed. The method reconstructs a damaged, lost, or delayed image signal during transmission by using complete haptic signals received by a receiving end in the cross-modal communication system, and further constructs a cross-modal interaction network with reference to an attention mechanism, thus solving the limitati…
Who is the assignee on this patent?: Univ Nanjing Posts & Telecommunications
What technology area does this patent fall under?: Primary CPC classification G06T11/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Augmented reality (ar) pen/hand tracking

Cross-Modality Curiosity for Sparse-Reward Tasks

Cross-modality processing method and apparatus, and computer storage medium

Magnetic Resonance Image Reconstruction System and Method

Automatic Haptic Generation Based on Visual Odometry

Friction modulation for three dimensional relief in a haptic device

Frequently asked questions