What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Reinforcement learning-based techniques for training a natural media agent

US11775817B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11775817-B2
Application number	US-201916549072-A
Country	US
Kind code	B2
Filing date	Aug 23, 2019
Priority date	Aug 23, 2019
Publication date	Oct 3, 2023
Grant date	Oct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Some embodiments involve a reinforcement learning based framework for training a natural media agent to learn a rendering policy without human supervision or labeled datasets. The reinforcement learning based framework feeds the natural media agent a training dataset to implicitly learn the rendering policy by exploring a canvas and minimizing a loss function. Once trained, the natural media agent can be applied to any reference image to generate a series (or sequence) of continuous-valued primitive graphic actions, e.g., sequence of painting strokes, that when rendered by a synthetic rendering environment on a canvas, reproduce an identical or transformed version of the reference image subject to limitations of an action space and the learned rendering policy.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer readable media for training a natural media agent to implicitly learn a rendering policy in a multi-dimensional continuous action space from a set of training references, the one or more non-transitory computer readable media comprising instructions that, when executed by at least one processor of a reinforcement learning-based system, iteratively cause the system to: direct a media rendering engine to perform at least one primitive graphic action on a canvas in a synthetic rendering environment, wherein the natural media agent is configured to apply the rendering policy to select the at least one primitive graphic action at each iteration based on a working observation of a current state of the system; observe a visual state of the canvas and a position of a media rendering instrument within the synthetic rendering environment occurring as a result of performing the at least one primitive graphic action on the canvas; apply a loss function to compute a reward based on a goal configuration and the visual state of the canvas occurring as a result of performing the at least one primitive graphic action, wherein the goal configuration comprises a current training reference of the set of training references; and provide the reward to the natural media agent to learn the rendering policy by refining a policy function. 2. The one or more non-transitory computer readable media of claim 1 , wherein the instructions, when executed by the least one processor, further iteratively cause the system to: observe, at each iteration, a current state of the synthetic rendering environment including a current visual state of the canvas and a current position of a media rendering instrument within the synthetic rendering environment; determine the current state of the system by combining the current state of the synthetic rendering environment with a current training reference image; generate the working observation based on the current state of the system; and provide the working observation to the policy function. 3. The one or more non-transitory computer readable media of claim 2 , wherein to generate the working observation of the current state of the system, the instructions, when executed by the least one processor, further cause the system to: identify the current position of the media rendering instrument within the synthetic rendering environment; capture egocentric patches of the canvas and the current training reference; concatenate the egocentric patches of the current training reference and canvas to form a visual portion of the working observation of the current state of the system; and combine the visual portion of the working observation with the current position of the media rendering instrument within the synthetic rendering environment to generate the working observation of the current state of the system. 4. The one or more non-transitory computer readable media of claim 1 , wherein the instructions, when executed by the least one processor, further iteratively cause the system to: sample training reference of the set of training references as the goal configuration. 5. The one or more non-transitory computer readable media of claim 1 , wherein the policy function is implemented with a deep neural network. 6. The one or more non-transitory computer readable media of claim 1 , wherein the loss function is defined as one of L 2 loss, L 1/2 loss, or perceptual loss. 7. The one or more non-transitory computer readable media of claim 1 , wherein the loss function is designed to capture content and other abstract information of the goal configuration. 8. The one or more non-transitory computer readable media of claim 1 , wherein the reinforcement learning-based system sets a limit on a number of steps for each episode, even if the natural media agent fails to achieve the goal configuration, wherein each episode is characterized by failure or success of the reinforcement learning-based system to achieve a corresponding goal configuration. 9. The one or more non-transitory computer readable media of claim 8 , wherein the reinforcement learning-based system gradually increases the limit on the number of steps for each episode. 10. The one or more non-transitory computer readable media of claim 1 , wherein the set of training references comprise patches sampled from multiple reference images. 11. The one or more non-transitory computer readable media of claim 10 , wherein the reinforcement learning-based system is configured to sample the patches according to a predicted difficulty in achieving a corresponding goal configuration and the reinforcement learning-based system sets a limit on a number of steps for each episode based on the predicted difficultly. 12. The one or more non-transitory computer readable media of claim 1 , wherein the reinforcement learning-based system is configured to utilize supervised learning to pre-train the policy function. 13. A computer-implemented method comprising: observing, by an observation module, a current visual state of a canvas and a current position of a media rendering instrument within a rendering environment; predicting at least one primitive graphic action by feeding a representation of the current position of the media rendering instrument, at least a portion of the current visual state of the canvas, and at least a portion of a current training reference of a set of training references to a neural network; observing, by the observation module, an updated visual state of the canvas and an updated position of the media rendering instrument within the rendering environment occurring in response to a media rendering engine performing the at least one graphic action on the canvas; comparing, by a reward generation module, the updated visual state of the canvas with a goal configuration to determine a reward; and refining the neural network based on the reward to iteratively learn a rendering policy. 14. The computer-implemented method of claim 13 , further comprising: sampling, by the reward generation module, the current training reference of the set of training references or a current target reference as the goal configuration. 15. The computer-implemented method of claim 13 , further comprising: generating, by the observation module, at least the portion of the current visual state of the canvas and at least the portion of the current training reference of the set of training references by capturing and concatenating egocentric patches of the canvas and the current training reference. 16. The computer-implemented method of claim 13 , wherein comparing of the updated canvas with a goal configuration to compute a reward comprises applying a loss function defined as one of L 2 loss, L 1/2 loss, or perceptual loss. 17. A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: directing a media rendering engine to perform at least one primitive graphic action on a canvas in a synthetic rendering environment, wherein a natural media agent is configured to apply a rendering policy to select the at least one primitive graphic action at each iteration based on a working observation of a current state of the system; observing a visual state of the canvas and a position of a media rendering instrument within the synthetic rendering environment occurring as a result of performing the at least one primitive graphic action on the canvas; applying a loss function to compute a

Assignees

Adobe Inc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/092
Reinforcement learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

View patent family 74646341

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11775817B2 cover?: Some embodiments involve a reinforcement learning based framework for training a natural media agent to learn a rendering policy without human supervision or labeled datasets. The reinforcement learning based framework feeds the natural media agent a training dataset to implicitly learn the rendering policy by exploring a canvas and minimizing a loss function. Once trained, the natural media ag…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).