Method and system for generating multimodal digital images

US9971958B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9971958-B2
Application numberUS-201615189075-A
CountryUS
Kind codeB2
Filing dateJun 22, 2016
Priority dateJun 1, 2016
Publication dateMay 15, 2018
Grant dateMay 15, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method generates a multimodal digital image by processing a vector with a first neural network to produce a first modality of the digital image and processing the vector with a second neural network to produce a second modality of the digital image. A structure and a number of layers of the first neural network are identical to a structure and a number of layers of the second neural network. Also, at least one layer in the first neural network has parameters identical to parameters of a corresponding layer in the second neural network, and at least one layer in the first neural network has parameters different from parameters of a corresponding layer in the second neural network.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for generating a multimodal digital image, wherein the method uses a processor coupled with stored instructions implementing the method, wherein the instructions, when executed by the processor carry out steps of the method comprising: acquiring an image of a scene indicative of features of the scene; processing the image with a first neural network to produce a first image having a first modality; image; processing the image with a second neural network to produce a second image having a second modality, such that the first image and the second image form the multimodal digital image, the wherein a structure and a number of layers of the first neural network are identical to a structure and a number of layers of the second neural network, wherein at least one layer in the first neural network has parameters identical to parameters of a corresponding layer in the second neural network, and wherein at least one layer in the first neural network has parameters different from parameters of a corresponding layer in the second neural network, wherein the layers of the first and the second neural networks having identical parameters produce high-level features of the first and the second images of the multimodal digital image, and wherein the layers of the first and the second neural networks having different parameters produce low-level features of the first and the second images of the multimodal digital image, wherein the first neural network and the second neural network are trained jointly while enforcing identical parameters for several bottom layers of the first neural network and the second neural network, wherein at least one or both of the first neural network and the second neural network are trained using generative adversarial nets (GAN) including a generative subnetwork for producing a sample of the digital image of a specific modality and a discriminative subnetwork for testing if the sample of the digital image produced by the generative subnetwork has the specific modality; and outputting the multimodal digital image. 2. The method of claim 1 , further comprising: randomly generating elements of the image using a probabilistic distribution. 3. The method of claim 1 wherein the low-level features are derived from the high level features. 4. The method of claim 1 , wherein the digital image includes one or combination of an image, a video, a text, and a sound. 5. The method of claim 1 , wherein a first generative subnetwork and a first discriminative subnetwork of the first neural network and a second generative subnetwork and a second discriminative subnetwork of the second neural network are jointly trained to minimize a minimax objective function. 6. The method of claim 1 , further comprising: rendering the first image of the first modality and the second image of the second modality on a display device or transmitting the first image of the first modality and the second image of the second modality over a communication channel. 7. The method of claim 1 , wherein the first modality of the first image is a color image, and wherein the second modality of the second image is a depth image. 8. The method of claim 1 , wherein the first modality of the first image is a color image, and wherein the second modality of the second image is a thermal image. 9. The method of claim 1 , wherein the first modality of the first image is an image having a first style, and wherein the second modality of the second image is an image having a second style. 10. The method of claim 1 , wherein the first neural network and the second neural network are selected from a set of the neural networks jointly trained to produce a set of modalities of the digital image, comprising: processing the image with a set of neural networks to produce the multimodal digital image. 11. The method of claim 10 , wherein the set of the neural networks forms a Coupled Generative Adversarial Nets (CoGAN). 12. A system for generating a multimodal digital image, comprising: an input interface to acquire an image of a scene indicative of features of the scene; at least one non-transitory computer readable memory storing a first neural network trained to produce a first modality of the multimodal digital image and a second neural network trained to produce a second modality of the multimodal digital image, wherein a structure and a number of layers of the first neural network are identical to a structure and a number of layers of the second neural network, wherein at least one layer in the first neural network has parameters identical to parameters of a corresponding layer in the second neural network, and wherein at least one layer in the first neural network has parameters different from parameters of a corresponding layer in the second neural network, wherein the layers of the first and the second neural networks having identical parameters produce high-level features of the first and the second images of the multimodal digital image, and wherein the layers of the first and the second neural networks having different parameters produce low-level features of the first and the second images of the multimodal digital image, wherein the first neural network and the second neural network are trained jointly while enforcing identical parameters for several bottom layers of the first neural network and the second neural network, wherein at least one or both of the first neural network and the second neural network are trained using generative adversarial nets (GAN) including a generative subnetwork for producing a sample of the digital image of a specific modality and a discriminative subnetwork for testing if the sample of the digital image produced by the generative subnetwork has the specific modality; a processor to generate the multimodal digital image by processing the image with the first neural network to produce a first modality of a first image and processing the image with the second neural network to produce a second modality of a second image, such that the first image and the second image form the multimodal digital image; and an output interface to output the multimodal digital image. 13. The system of claim 12 , further comprising: a display device for displaying the multimodal digital image, such that the output interface outputs the multimodal digital image to the display device. 14. The system of claim 12 , wherein the high-level features are attributed to entire digital image and the low-level features are attributed to a portion of the digital image. 15. The system of claim 12 , wherein the first modality of the first image is a color image, and wherein the second modality of the second image is a depth image or a thermal image. 16. The system of claim 12 , wherein the first modality of the first image is an image having a first style, and wherein the second modality of the second image is an image having a second style. 17. A non-transitory computer-readable medium with instructions stored thereon, that when executed by a processor, perform the steps comprising: acquiring an image of a scene indicative of features of the scene; processing the image with a first neural network to produce a first image having a first modality; image; processing the image with a second neural network to produce a second image having a second modality, such that the first image and the second image form the multimodal digital image, the wherein a structure and a number of layers of the first neural network are identical to a structure and a number of layers of the

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06V10/454Primary

    Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Creating or editing images; Combining images with text · CPC title

  • Classification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9971958B2 cover?
A computer-implemented method generates a multimodal digital image by processing a vector with a first neural network to produce a first modality of the digital image and processing the vector with a second neural network to produce a second modality of the digital image. A structure and a number of layers of the first neural network are identical to a structure and a number of layers of the se…
Who is the assignee on this patent?
Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 15 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).