What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jun 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Computing apparatus and method for performing reinforcement learning using multimodal artificial intelligence agent

US2023177820A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2023177820-A1
Application number	US-202218061266-A
Country	US
Kind code	A1
Filing date	Dec 2, 2022
Priority date	Dec 3, 2021
Publication date	Jun 8, 2023
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are a computing apparatus and method for performing reinforcement learning using a multimodal artificial intelligence agent. The method for performing reinforcement learning using a multimodal artificial intelligence agent includes: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images. The plurality of guidance types is classified into three or more types according to their guidance level. Performing the reinforcement learning is performing reinforcement learning by applying a moderate-level guidance type to the sections of predetermined critical periods and also applying any one of the plurality of guidance types to the other sections.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing reinforcement learning using a multimodal artificial intelligence agent, the method comprising: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images; wherein the plurality of guidance types is classified into three or more types according to their guidance level; and wherein performing the reinforcement learning is performing reinforcement learning by applying a moderate-level guidance type to sections of predetermined critical periods and also applying any one of the plurality of guidance types to remaining sections. 2 . The method of claim 1 , wherein the training target images are images acquired by capturing one or more objects in the virtual environment, and include images for binocular vision and three-dimensional (3D) spatialized audio. 3 . The method of claim 2 , wherein performing the reinforcement learning comprises: integrating an output, obtained by processing the images for binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and an output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 4 . The method of claim 1 , wherein the multimodal artificial intelligence agent is equipped with binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics. 5 . A non-transitory computer-readable storage medium having stored thereon a program that, when executed by a processor, causes the processor to execute the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 6 . A computer program that is executed by an apparatus for providing game replays and stored in a non-transitory computer-readable storage medium in order to perform the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 7 . A computing apparatus for performing reinforcement learning using a multimodal artificial intelligence agent, the computing apparatus comprising: an input/output interface configured to receive data and output results of operational processing of the data; storage configured to store a program and data for performing reinforcement learning using a multimodal artificial intelligence agent; and a controller including at least one processor, and configured to perform the reinforcement learning by executing the program; wherein the controller divides frames, included in images acquired by capturing a virtual environment, into a plurality of sections and also performs the reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images by executing the program; wherein the plurality of guidance types is classified into three or more stages according to their guidance level; and wherein the controller performs the reinforcement learning by applying a moderate-level guidance type to sections of predetermined critical periods and also applying any one of the plurality of guidance types to remaining sections. 8 . The computing apparatus of claim 7 , wherein the training target images are images acquired by capturing one or more objects in the virtual environment, and include images for binocular vision and three-dimensional (3D) spatialized audio. 9 . The computing apparatus of claim 8 , wherein the controller performs the reinforcement learning by: integrating an output, obtained by processing the images for binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and an output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 10 . The computing apparatus of claim 7 , wherein the controller equips the multimodal artificial intelligence agent with binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics.

Assignees

Seoul Nat Univ R&Db Foundation

Inventors

Classifications

G06F3/16
Sound input; Sound output (speech processing G10L) · CPC title
G06V10/771
Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06T17/00
Three-dimensional [3D] modelling for computer graphics · CPC title
G06V10/82Primary
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 86607839

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023177820A1 cover?: Disclosed herein are a computing apparatus and method for performing reinforcement learning using a multimodal artificial intelligence agent. The method for performing reinforcement learning using a multimodal artificial intelligence agent includes: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learnin…
Who is the assignee on this patent?: Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jun 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).