Synthetic-to-realistic image conversion using generative adversarial network (gan) or other machine learning model
US-2024428568-A1 · Dec 26, 2024 · US
US2023177820A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023177820-A1 |
| Application number | US-202218061266-A |
| Country | US |
| Kind code | A1 |
| Filing date | Dec 2, 2022 |
| Priority date | Dec 3, 2021 |
| Publication date | Jun 8, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are a computing apparatus and method for performing reinforcement learning using a multimodal artificial intelligence agent. The method for performing reinforcement learning using a multimodal artificial intelligence agent includes: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images. The plurality of guidance types is classified into three or more types according to their guidance level. Performing the reinforcement learning is performing reinforcement learning by applying a moderate-level guidance type to the sections of predetermined critical periods and also applying any one of the plurality of guidance types to the other sections.
Opening claim text (preview).
What is claimed is: 1 . A method of performing reinforcement learning using a multimodal artificial intelligence agent, the method comprising: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images; wherein the plurality of guidance types is classified into three or more types according to their guidance level; and wherein performing the reinforcement learning is performing reinforcement learning by applying a moderate-level guidance type to sections of predetermined critical periods and also applying any one of the plurality of guidance types to remaining sections. 2 . The method of claim 1 , wherein the training target images are images acquired by capturing one or more objects in the virtual environment, and include images for binocular vision and three-dimensional (3D) spatialized audio. 3 . The method of claim 2 , wherein performing the reinforcement learning comprises: integrating an output, obtained by processing the images for binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and an output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 4 . The method of claim 1 , wherein the multimodal artificial intelligence agent is equipped with binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics. 5 . A non-transitory computer-readable storage medium having stored thereon a program that, when executed by a processor, causes the processor to execute the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 6 . A computer program that is executed by an apparatus for providing game replays and stored in a non-transitory computer-readable storage medium in order to perform the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 7 . A computing apparatus for performing reinforcement learning using a multimodal artificial intelligence agent, the computing apparatus comprising: an input/output interface configured to receive data and output results of operational processing of the data; storage configured to store a program and data for performing reinforcement learning using a multimodal artificial intelligence agent; and a controller including at least one processor, and configured to perform the reinforcement learning by executing the program; wherein the controller divides frames, included in images acquired by capturing a virtual environment, into a plurality of sections and also performs the reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images by executing the program; wherein the plurality of guidance types is classified into three or more stages according to their guidance level; and wherein the controller performs the reinforcement learning by applying a moderate-level guidance type to sections of predetermined critical periods and also applying any one of the plurality of guidance types to remaining sections. 8 . The computing apparatus of claim 7 , wherein the training target images are images acquired by capturing one or more objects in the virtual environment, and include images for binocular vision and three-dimensional (3D) spatialized audio. 9 . The computing apparatus of claim 8 , wherein the controller performs the reinforcement learning by: integrating an output, obtained by processing the images for binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and an output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 10 . The computing apparatus of claim 7 , wherein the controller equips the multimodal artificial intelligence agent with binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics.
Sound input; Sound output (speech processing G10L) · CPC title
Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title
using classification, e.g. of video objects · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.