Method of image reconstruction for cross-modal communication system and device thereof

US11748919B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11748919-B2
Application numberUS-202218002500-A
CountryUS
Kind codeB2
Filing dateJul 1, 2022
Priority dateJul 9, 2021
Publication dateSep 5, 2023
Grant dateSep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of image reconstruction for a cross-modal communication system is disclosed. The method reconstructs a damaged, lost, or delayed image signal during transmission by using complete haptic signals received by a receiving end in the cross-modal communication system, and further constructs a cross-modal interaction network with reference to an attention mechanism, thus solving the limitation of the conventional generation model that it can only be trained on paired samples. An image reconstruction device for a cross-modal communication system is also disclosed. By fully utilizing semantic correlation between different-modality data and realizing cross-modal generation from haptic signals to image signals for unpaired data, the present invention overcomes the difficulty in acquiring haptic-image signal pairs in the practical cross-modal communication system, and significantly improves the quality and class accuracy of generated image signals.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of image reconstruction for a cross-modal communication system, comprising the following steps: step 1. selecting haptic signals and image data pairs received by a receiving end of a cross-modal communication system to serve as a training set, wherein each haptic signal in the training set and each image data of the image data pairs has label information about the class it belongs to; step 2. establishing a cross-modal image generation model based on haptic signals, the model comprising an image feature extraction module, an attention mechanism-based cross-modal semantic learning module, and an adversarial image generation module, wherein the image feature extraction module comprises a convolutional neural network (CNN) and a first-class adversarial network, and the image feature extraction module is used for performing feature extraction for the image data in the training set to obtain an image feature; the cross-modal semantic learning module comprises an encoder, an attention mechanism-based semantic fusion network, and a second-class adversarial network, wherein the encoder performs feature extraction for haptic signals in the training set to obtain the corresponding haptic features; then, the haptic features and the image features are together input to the attention mechanism-based semantic fusion network, and the network performs similarity calculation between haptic features having the same label as the image feature and a sigmoid function operation is further performed to obtain weight vectors of the haptic features corresponding to the current image feature, and then, weighted summation is performed for the haptic features based on the weight vectors to obtain a synthetic haptic feature most similar to the current image feature; and afterwards, the second-class adversarial network strengthens the synthetic haptic feature under the effect of adversarial learning to maintain class and distribution characteristics of the haptic signals; and the adversarial image generation module comprises a generative adversarial network, and is used for outputting a generated image having the same label as the strengthened synthetic haptic feature after receiving the synthetic haptic feature; step 3. training the cross-modal image generation model based on haptic signals, wherein an intra-modal loss of the image feature is calculated according to the image feature extraction module, an intra-modal loss of the synthetic haptic feature and an inter-modal loss between the synthetic haptic feature and the image feature are calculated according to the attention mechanism-based cross-modal semantic learning module, and an adversarial generation loss of the generated image is calculated according to the adversarial image generation module and by means of mean square error; these calculated losses are used for updating parameters in the cross-modal image generation model; and after the training converges, an optimal cross-modal image generation model and parameters at this time are saved; and step 4. after completion of the training, inputting the haptic signal received by the receiving end of the cross-modal communication system to the trained cross-modal image generation model to output a target image. 2. The image reconstruction method for a cross-modal communication system according to claim 1 , wherein feature extraction for the image data in step 2 comprises the following steps: (2-1) subjecting image data V to processing by the CNN to obtain an image feature v′ (f) , wherein the CNN comprises a plurality of convolutional layers and a pooling layer is connected after each convolutional layer; (2-2) constructing a first-class adversarial network for v′ (f) , the first-class adversarial network comprising a class label predictor f v (·) with a network parameter O and a class label discriminator D 1 with a network parameter α, wherein f v (·) consists of a plurality of fully connected layers and one softmax layer, and an input of f v (·) is the image feature v′ (f) and an output of f v (·) is a predicted class label v (c) =f v (v′ (f) ; θ v ); the class label discriminator D 1 consists of a plurality of fully connected layers that are sucessively connected and the dimension of the last layer is 1; and D 1 is used for discriminating v (c) and a true label y v corresponding to the image feature v′ (f) ; and by means of adversarial training by f v (·) and D 1 , v′ (f) is updated constantly, and an image feature v (f) ={v i (f) , i=1, 2, . . . , N} that has class characteristic is finally extracted, wherein v i (f) is an image feature of the i-th image data and N is a total image data amount. 3. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein an adversarial loss of the first-class adversarial network is as follows: L cat V ( D 1 )=− E y v [log D 1 ( y v ;α)]− E v (c) [log(1− D 1 ( v (c) ;α))] L cat V ( v (c) )=− E v (c) [log(1− D 1 ( v (c) ;α))] wherein L cat V (D 1 ) is an adversarial loss function for the class label discriminator D 1 ; E y v [*] and E v (c) [*] refer to calculation of an expectation for *; D 1 (y v ; α) indicates a discrimination result of the class label discriminator for a true label y v ; D 1 (v (c) ; α) indicates a discrimination result of the class label discriminator for v (c) output by the class label predictor; and L cat F (v (c) ) an adversarial loss function for the class label predictor f v (·). 4. The image reconstruction method for a cross-modal communication system according to claim 2 , wherein a learning process of the attention mechanism-based cross-modal semantic learning module in step 2 is specifically as follows: (3-1) subjecting a haptic signal to processing by the encoder to obtain a haptic feature h (f) ={h j (f) , j=1,2, . . . , N}, wherein h j (f) is a haptic feature of the j-th haptic signal, N is a total data amount of haptic signals, and the encoder comprises a gated recurrent unit (GRU) and a plurality of fully connected layers; (3-2) matching, by the attention mechanism-based semantic fusion network, the haptic feature and v (f) extracted in step (2-2), wherein with each v i (f) as a query vector, a synthetic haptic feature {tilde over (h)} i (f) belonging to the same class as v i (f) is screened out, wherein {tilde over (h)} i (f) and v i (f) form a haptic-image feature pair, and then a synthetic haptic feature corresponding to v (f) is {tilde over (h)} (f) ={{tilde over (h)} i (f) , i=1,2, . . . , N}, which is specifically as follows: 3-2-1. inputting v i (f) and the haptic feature h (f) to the attention mechanism-based semantic fusion network to output a haptic hidden layer representation vector h (r) ={h j (r) , j=1, 2, . . . , N}, wherein h j (r) is a hidden layer representation vector of the j-th haptic feature h j (f) , the hidden layer is a single-layer perceptron structure, and an activation function is the Tanh ( ) function; and a specific process is as follows: h j (r) =Tanh( wh j (f) +b ) wherein w and b are network parameters of the hidden layer in the attention mechanism-based semantic fusion network; 3-2-2. calculating the Pearson correlation coefficient regarding h j (r) and v i (f) as the similarity: Sim i , j = I i

Assignees

Inventors

Classifications

  • G06T11/10Primary

    Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title

  • G06T11/00Primary

    Two-dimensional [2D] image generation · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • Activation functions · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748919B2 cover?
A method of image reconstruction for a cross-modal communication system is disclosed. The method reconstructs a damaged, lost, or delayed image signal during transmission by using complete haptic signals received by a receiving end in the cross-modal communication system, and further constructs a cross-modal interaction network with reference to an attention mechanism, thus solving the limitati…
Who is the assignee on this patent?
Univ Nanjing Posts & Telecommunications
What technology area does this patent fall under?
Primary CPC classification G06T11/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).