Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US11775817B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11775817-B2 |
| Application number | US-201916549072-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 23, 2019 |
| Priority date | Aug 23, 2019 |
| Publication date | Oct 3, 2023 |
| Grant date | Oct 3, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments involve a reinforcement learning based framework for training a natural media agent to learn a rendering policy without human supervision or labeled datasets. The reinforcement learning based framework feeds the natural media agent a training dataset to implicitly learn the rendering policy by exploring a canvas and minimizing a loss function. Once trained, the natural media agent can be applied to any reference image to generate a series (or sequence) of continuous-valued primitive graphic actions, e.g., sequence of painting strokes, that when rendered by a synthetic rendering environment on a canvas, reproduce an identical or transformed version of the reference image subject to limitations of an action space and the learned rendering policy.
Opening claim text (preview).
What is claimed is: 1. One or more non-transitory computer readable media for training a natural media agent to implicitly learn a rendering policy in a multi-dimensional continuous action space from a set of training references, the one or more non-transitory computer readable media comprising instructions that, when executed by at least one processor of a reinforcement learning-based system, iteratively cause the system to: direct a media rendering engine to perform at least one primitive graphic action on a canvas in a synthetic rendering environment, wherein the natural media agent is configured to apply the rendering policy to select the at least one primitive graphic action at each iteration based on a working observation of a current state of the system; observe a visual state of the canvas and a position of a media rendering instrument within the synthetic rendering environment occurring as a result of performing the at least one primitive graphic action on the canvas; apply a loss function to compute a reward based on a goal configuration and the visual state of the canvas occurring as a result of performing the at least one primitive graphic action, wherein the goal configuration comprises a current training reference of the set of training references; and provide the reward to the natural media agent to learn the rendering policy by refining a policy function. 2. The one or more non-transitory computer readable media of claim 1 , wherein the instructions, when executed by the least one processor, further iteratively cause the system to: observe, at each iteration, a current state of the synthetic rendering environment including a current visual state of the canvas and a current position of a media rendering instrument within the synthetic rendering environment; determine the current state of the system by combining the current state of the synthetic rendering environment with a current training reference image; generate the working observation based on the current state of the system; and provide the working observation to the policy function. 3. The one or more non-transitory computer readable media of claim 2 , wherein to generate the working observation of the current state of the system, the instructions, when executed by the least one processor, further cause the system to: identify the current position of the media rendering instrument within the synthetic rendering environment; capture egocentric patches of the canvas and the current training reference; concatenate the egocentric patches of the current training reference and canvas to form a visual portion of the working observation of the current state of the system; and combine the visual portion of the working observation with the current position of the media rendering instrument within the synthetic rendering environment to generate the working observation of the current state of the system. 4. The one or more non-transitory computer readable media of claim 1 , wherein the instructions, when executed by the least one processor, further iteratively cause the system to: sample training reference of the set of training references as the goal configuration. 5. The one or more non-transitory computer readable media of claim 1 , wherein the policy function is implemented with a deep neural network. 6. The one or more non-transitory computer readable media of claim 1 , wherein the loss function is defined as one of L 2 loss, L 1/2 loss, or perceptual loss. 7. The one or more non-transitory computer readable media of claim 1 , wherein the loss function is designed to capture content and other abstract information of the goal configuration. 8. The one or more non-transitory computer readable media of claim 1 , wherein the reinforcement learning-based system sets a limit on a number of steps for each episode, even if the natural media agent fails to achieve the goal configuration, wherein each episode is characterized by failure or success of the reinforcement learning-based system to achieve a corresponding goal configuration. 9. The one or more non-transitory computer readable media of claim 8 , wherein the reinforcement learning-based system gradually increases the limit on the number of steps for each episode. 10. The one or more non-transitory computer readable media of claim 1 , wherein the set of training references comprise patches sampled from multiple reference images. 11. The one or more non-transitory computer readable media of claim 10 , wherein the reinforcement learning-based system is configured to sample the patches according to a predicted difficulty in achieving a corresponding goal configuration and the reinforcement learning-based system sets a limit on a number of steps for each episode based on the predicted difficultly. 12. The one or more non-transitory computer readable media of claim 1 , wherein the reinforcement learning-based system is configured to utilize supervised learning to pre-train the policy function. 13. A computer-implemented method comprising: observing, by an observation module, a current visual state of a canvas and a current position of a media rendering instrument within a rendering environment; predicting at least one primitive graphic action by feeding a representation of the current position of the media rendering instrument, at least a portion of the current visual state of the canvas, and at least a portion of a current training reference of a set of training references to a neural network; observing, by the observation module, an updated visual state of the canvas and an updated position of the media rendering instrument within the rendering environment occurring in response to a media rendering engine performing the at least one graphic action on the canvas; comparing, by a reward generation module, the updated visual state of the canvas with a goal configuration to determine a reward; and refining the neural network based on the reward to iteratively learn a rendering policy. 14. The computer-implemented method of claim 13 , further comprising: sampling, by the reward generation module, the current training reference of the set of training references or a current target reference as the goal configuration. 15. The computer-implemented method of claim 13 , further comprising: generating, by the observation module, at least the portion of the current visual state of the canvas and at least the portion of the current training reference of the set of training references by capturing and concatenating egocentric patches of the canvas and the current training reference. 16. The computer-implemented method of claim 13 , wherein comparing of the updated canvas with a goal configuration to compute a reward comprises applying a loss function defined as one of L 2 loss, L 1/2 loss, or perceptual loss. 17. A system comprising: a memory component; and a processing device coupled to the memory component, the processing device to perform operations comprising: directing a media rendering engine to perform at least one primitive graphic action on a canvas in a synthetic rendering environment, wherein a natural media agent is configured to apply a rendering policy to select the at least one primitive graphic action at each iteration based on a working observation of a current state of the system; observing a visual state of the canvas and a position of a media rendering instrument within the synthetic rendering environment occurring as a result of performing the at least one primitive graphic action on the canvas; applying a loss function to compute a
Convolutional networks [CNN, ConvNet] · CPC title
Reinforcement learning · CPC title
Supervised learning · CPC title
Learning methods · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.