Target-oriented reinforcement learning method and apparatus for performing the same

US12223695B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12223695-B2
Application numberUS-202017427957-A
CountryUS
Kind codeB2
Filing dateDec 8, 2020
Priority dateOct 12, 2020
Publication dateFeb 11, 2025
Grant dateFeb 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A target-oriented reinforcement learning method according to an embodiment includes: collecting data related to the target of reinforcement learning as target data in the process of performing the reinforcement learning; learning the collected target data as auxiliary learning for the reinforcement learning; and incorporating the results of the learning of the target data into the performance of the reinforcement learning.

First claim

Opening claim text (preview).

The invention claimed is: 1. A reinforcement learning method performed by a target-oriented reinforcement learning model, the reinforcement learning method comprising: collecting a data related to a target of the reinforcement learning as target data when an event in which an agent achieving the target and receiving a reward occurs, wherein the target data comprises a predetermined number of frame images before an occurrence of the event and a label indicating the target data corresponds to the target; learning the target data as auxiliary learning for the reinforcement learning; and incorporating results of the learning of the target data into performance of the reinforcement learning, wherein the target-oriented reinforcement learning model comprises: a feature extraction unit implemented by at least one processor and configured to extract features from state data and the target data; an action module implemented by the at least one processor and configured to output an action and a value according to a policy based on the feature extracted from the state data; and a classification module implemented by the at least one processor and configured to classify the target data based on the feature extracted from the target data, and wherein learning the collected target data comprises: extracting, by the feature extraction unit, a feature from batch data of the target data; extracting, by the classification module, a predicted value according to the feature extracted from the batch data of the target data; calculating, by the target-oriented reinforcement learning model, a loss for the auxiliary learning by using the predicted value and the label of the target data; and learning, by the target-oriented reinforcement learning model, the visual representation of the target data by using the loss for the auxiliary learning. 2. A computer-readable storage medium having stored thereon a program for performing the method set forth in claim 1 . 3. A computer program performed by a computing device and stored in a medium in order to perform the method set forth in claim 1 . 4. A computing device for performing target-oriented reinforcement learning, the computing device comprising: an input device configured to receive data; a memory configured to store a program for performing reinforcement learning and target data collected in a process of performing the reinforcement learning; and at least one processor configured to perform the reinforcement learning using the data received through the input device by executing the program; wherein a target-oriented reinforcement learning model implemented in such a manner that the at least one processor executes the program, collects data related to a target of the reinforcement learning as the target data in a process of performing the reinforcement learning when an event in which an agent achieving the target and receiving a reward occurs, wherein the target data comprises a predetermined number of frame images before an occurrence of the event and a label indicating the target data corresponds to the target, learns the target data as auxiliary learning for the reinforcement learning, and incorporates results of the learning of the target data into performance of the reinforcement learning, and wherein the target-oriented reinforcement learning model comprises: a feature extraction unit implemented by the at least one processor and configured to extract features from state data and the target data; an action module implemented by the at least one processor and configured to output an action and a value according to a policy based on the feature extracted from the state data; and a classification module implemented by at least one processor and configured to classify the target data based on the feature extracted from the target data, and wherein when learning the collected target data, the target-oriented reinforcement learning model is operated such that the feature extraction unit extracts a feature from batch data of the target data, the classification module extracts a predicted value according to the feature extracted from the batch data of the target data, the target-oriented reinforcement learning model calculates a loss for the auxiliary learning by using the predicted value and the label of the target data, and the target-oriented reinforcement learning model learns the visual representation of the target data by using the loss for the auxiliary learning.

Assignees

Inventors

Classifications

  • Target detection · CPC title

  • Machine learning · CPC title

  • G06V10/776Primary

    Validation; Performance evaluation · CPC title

  • G06V10/40Primary

    Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12223695B2 cover?
A target-oriented reinforcement learning method according to an embodiment includes: collecting data related to the target of reinforcement learning as target data in the process of performing the reinforcement learning; learning the collected target data as auxiliary learning for the reinforcement learning; and incorporating the results of the learning of the target data into the performance o…
Who is the assignee on this patent?
Seoul Nat Univ R&Db Foundation
What technology area does this patent fall under?
Primary CPC classification G06V10/776. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).