Computing apparatus and method for performing reinforcement learning using multimodal artificial intelligence agent

US12505659B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12505659-B2
Application numberUS-202218061266-A
CountryUS
Kind codeB2
Filing dateDec 2, 2022
Priority dateDec 3, 2021
Publication dateDec 23, 2025
Grant dateDec 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are a computing apparatus and method for performing reinforcement learning using a multimodal artificial intelligence agent. The method for performing reinforcement learning using a multimodal artificial intelligence agent includes: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learning by applying any one of a plurality of guidance types to each of the plurality of sections and then allowing a multimodal artificial intelligence agent to interact with the virtual environment through the images. The plurality of guidance types is classified into three or more types according to their guidance level. Performing the reinforcement learning is performing reinforcement learning by applying a moderate-level guidance type to the sections of predetermined critical periods and also applying any one of the plurality of guidance types to the other sections.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of performing reinforcement learning using a multimodal artificial intelligence agent, the method comprising: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing the reinforcement learning by applying any one of a plurality of guidance levels to each of the plurality of sections and then allowing the multimodal artificial intelligence agent to interact with the virtual environment through the images, wherein the plurality of guidance levels are classified into three or more stages which comprise a weak-mentor guidance, a moderate-mentor guidance, and a mentor demonstration, wherein the performing of the reinforcement learning comprises performing the reinforcement learning by applying the moderate-mentor guidance to sections of predetermined critical periods and also applying any one of the plurality of guidance levels to remaining sections, and wherein the images are acquired by capturing one or more objects in the virtual environment, and the images include one or more images for binocular vision and three-dimensional (3D) spatialized audio. 2 . The method of claim 1 , wherein the performing of the reinforcement learning further comprises: integrating a first output, obtained by processing the images for the binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and a second output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 3 . The method of claim 1 , wherein the multimodal artificial intelligence agent is equipped with the binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics. 4 . A non-transitory computer-readable storage medium having stored thereon a program that, when executed by a processor, causes the processor to execute the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 5 . A computer program that is executed by an apparatus for providing game replays and stored in a non-transitory computer-readable storage medium in order to perform the method of performing reinforcement learning using a multimodal artificial intelligence agent set forth in claim 1 . 6 . A computing apparatus for performing reinforcement learning using a multimodal artificial intelligence agent, the computing apparatus comprising: an input/output interface configured to receive data and output results of operational processing of the data; storage configured to store a program and data for the performing of the reinforcement learning using the multimodal artificial intelligence agent; and a controller including at least one processor, and configured to perform the reinforcement learning by executing the program; wherein the controller divides frames, included in images acquired by capturing a virtual environment, into a plurality of sections and also performs the reinforcement learning by applying any one of a plurality of guidance levels to each of the plurality of sections and then allowing the multimodal artificial intelligence agent to interact with the virtual environment through the images by executing the program, wherein the plurality of guidance levels are classified into three or more stages which comprise a weak-mentor guidance, a moderate-mentor guidance, and a mentor demonstration, wherein the controller performs the reinforcement learning by applying the moderate-mentor guidance to sections of predetermined critical periods and also applying any one of the plurality of guidance levels to remaining sections, and wherein the images are acquired by capturing one or more objects in the virtual environment, and the images include one or more images for binocular vision and three-dimensional (3D) spatialized audio. 7 . The computing apparatus of claim 6 , wherein the controller performs the reinforcement learning by: integrating a first output, obtained by processing the images for the binocular vision using a convolutional neural network and then passing the processed images through a first multilayer perceptron, and a second output, obtained by vectorizing the 3D spatialized audio on an assumption that the 3D spatialized audio is received through both ears and then passing the vectorized 3D spatialized audio through a second multilayer perceptron, into an interactive feature map; and performing masking based on the interactive feature map on results of linear projection of an object finding query and then passing the results, on which the masking has been performed, through a third multilayer perceptron. 8 . The computing apparatus of claim 6 , wherein the controller equips the multimodal artificial intelligence agent with the binocular vision, 3D spatialized audio, mesh-based tactile, joint-level physics, objective interaction, and realistic collider characteristics.

Assignees

Inventors

Classifications

  • using classification, e.g. of video objects · CPC title

  • Feature selection, e.g. selecting representative features from a multi-dimensional feature space · CPC title

  • Sound input; Sound output (speech processing G10L) · CPC title

  • Three-dimensional [3D] modelling for computer graphics · CPC title

  • using two two-dimensional [2D] image sensors having a relative position equal to or related to the interocular distance (H04N13/243 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12505659B2 cover?
Disclosed herein are a computing apparatus and method for performing reinforcement learning using a multimodal artificial intelligence agent. The method for performing reinforcement learning using a multimodal artificial intelligence agent includes: dividing frames, included in images acquired by capturing a virtual environment, into a plurality of sections; and performing reinforcement learnin…
Who is the assignee on this patent?
Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).