What technology area does this patent fall under?

Primary CPC classification G06N3/045. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training an unsupervised memory-based prediction system to learn compressed representations of an environment

US12159221B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12159221-B2
Application number	US-201916766945-A
Country	US
Kind code	B2
Filing date	Mar 11, 2019
Priority date	Mar 9, 2018
Publication date	Dec 3, 2024
Grant date	Dec 3, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment. The method comprises: for each of a plurality of time steps: processing an observation for the time step and data read from the memory to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step; and generating a predicted return that will be received by the agent as a result of interactions with the environment after the observation for the time step is received.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a memory-based prediction neural network system using a predictive modeling process, wherein the memory-based prediction neural network system includes an encoder and a representation system and has a plurality of memory-based prediction parameters and is configured to: receive, by the memory-based prediction neural network system, an observation characterizing a state of an environment interacted with by an agent; and process, by the memory-based prediction neural network system, the observation and data read from a memory to update data stored in the memory and to generate a latent representation of the state of the environment; the method comprising: performing, by the memory-based prediction neural network system, for each of a plurality of time steps: receiving a current observation for the time step characterizing a current state of the environment being interacted with by the agent at the time step; obtaining the data read from the memory, comprising: processing one or more latent representations generated at a previous time step in accordance with current values of the memory-based prediction parameters to generate one or more read keys; and generating one or more readout vectors based on a first measure of similarity between: (i) the one or more read keys derived from the one or more latent representations generated at the previous time step, and (ii) data stored in the memory, wherein the one or more readout vectors define the data read from the memory; processing the current observation for the time step and data read from the memory in accordance with the current values of the memory-based prediction parameters to: (i) update the data stored in the memory, and (ii) generate a latent representation of the current state of the environment as of the time step, comprising: processing the current observation using the encoder and in accordance with the current values of the memory-based prediction parameters to generate an encoded representation of the current observation; processing: (i) the encoded representation of the current observation, and (ii) the data read from the memory, using the representation system and in accordance with the current values of the memory-based prediction parameters, to generate the latent representation of the current state of the environment as of the time step; and writing the latent representation of the current state of the environment as of the time step to the memory; and generating, using the latent representation of the current state of the environment as of the time step and in accordance with the current values of the memory-based prediction parameters, data characterizing a predicted return, wherein the predicted return is a prediction for a cumulative measure of rewards that will be received by the agent as a result of interactions with the environment to perform a task after the current observation for the time step is received; determining, for one or more of the time steps, an actual return for the time step that is a cumulative measure of rewards received by the agent as a result of interactions with the environment to perform the task after the current observation for the time step is received; determining a gradient based at least in part on, for one or more of the time steps, an error between: (i) the data characterizing the predicted return for the time step, and (ii) the actual return for the time step; and adjusting the current values of the memory-based prediction parameters using the gradient. 2. The method of claim 1 , wherein the current values of the memory-based prediction parameters include current values of the representation system parameters and wherein adjusting the current values of the memory-based prediction parameters using the gradient comprises: adjusting the current values of the representation system parameters using the gradient. 3. The method of claim 1 , wherein for each of the plurality of time steps, processing the one or more previous latent representations using the memory-based prediction neural network system and in accordance with the current values of the memory-based prediction parameters to generate the one or more read keys comprises: processing, using a recurrent neural network and in accordance with current values of recurrent neural network parameters, an input comprising the one or more previous latent representations to generate a recurrent neural network output; and generating the one or more read keys based on the recurrent neural network output. 4. The method of claim 3 , wherein processing: (i) the encoded representation of the current observation, and (ii) the data read from the memory, using the representation system and in accordance with the current values of the memory-based prediction parameters, to generate the latent representation of the current state of the environment as of the time step comprises: processing, using a prior neural network and in accordance with current values of a set of neural network parameters of the prior neural network, an input comprising: (i) the recurrent neural network output, and (ii) the data read from the memory, to generate parameters of a prior probability distribution over a latent representation space; processing, using a posterior neural network and in accordance with current values of posterior neural network parameters, an input comprising: (i) the parameters of the prior probability distribution, (ii) the encoded representation of the current observation, (iii) the recurrent neural network output, and (iv) the data read from the memory, to generate parameters of a posterior probability distribution over the latent representation space; and generating the latent representation of the current state of the environment as of the time step by sampling a latent representation from the posterior probability distribution. 5. The method of claim 4 , wherein: the parameters of the prior probability distribution include prior mean parameters and prior standard deviation parameters; the parameters of the posterior probability distribution include posterior mean parameters and posterior standard deviation parameters; and sampling a latent representation from the posterior probability distribution comprises sampling a latent representation from a Normal distribution defined by the parameters of the posterior probability distribution. 6. The method of claim 4 , further comprising: determining a divergence gradient based on, for one or more of the time steps, a second measure of similarity between: (i) the prior probability distribution over the latent representation space at the time step, and (ii) the posterior probability distribution over the latent representation space at the time step; and adjusting the current values of the memory-based prediction parameters using the divergence gradient. 7. The method of claim 1 , wherein writing data to the memory using the latent representation of the current state of the environment as of the time step comprises writing the latent representation of the current state of the environment as of the time step to a specific location in the memory and updating, using the latent representation of the current state of the environment as of the time step, data written to the memory at previous time steps. 8. The method of claim 1 , wherein writing data to the memory using the latent representation of the current state of the environment as of the time step comprises, in response to determining that the memory is full, overwriting specific data in the memory based on how frequently the specific data is read from the memory. 9. The method of claim 1 , wherein the current values of the memory-based

Assignees

Deepmind Tech Ltd

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/045Primary
Combinations of networks · CPC title
G06N3/088Primary
Non-supervised learning, e.g. competitive learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0475
Generative networks · CPC title

Patent family

Related publications grouped by family.

View patent family 65729359

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12159221B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a memory-based prediction system configured to receive an input observation characterizing a state of an environment interacted with by an agent and to process the input observation and data read from a memory to update data stored in the memory and to generate a latent representation…
Who is the assignee on this patent?: Deepmind Tech Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 03 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).