What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Noisy neural network layers with noise parameters

US11977983B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11977983-B2
Application number	US-202017020248-A
Country	US
Kind code	B2
Filing date	Sep 14, 2020
Priority date	May 20, 2017
Publication date	May 7, 2024
Grant date	May 7, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer parameter of each noisy layer, a noisy current value for the layer parameter is determined from a current value of the layer parameter, a current value of a corresponding noise parameter, and the noise value. A network input including the observation is processed using the neural network in accordance with the noisy current values to generate a network output for the network input. An action is selected from a set of possible actions to be performed by the agent in response to the observation using the network output.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a neural network, wherein the neural network is configured to receive a network input and to process the network input to generate a network output, wherein the neural network comprises a plurality of layers each having respective layer parameters, wherein one or more of the layers are noisy layers, and wherein the method comprises: maintaining data specifying: (i) current values of each layer parameter of each of the plurality of layers and, (ii) for each layer parameter of each noisy layer, a current value of a corresponding noise parameter for the layer parameter in addition to the current value of the layer parameter; obtaining a training network input; determining, for each layer parameter of each noisy layer, a respective noise value based at least in part on a respective random value that is sampled in accordance with a probability distribution, comprising: for each noisy layer, sampling a predetermined number of random values from one or more predetermined probability distributions, wherein the predetermined number of random values is less than a number of respective layer parameters for the noisy layer; and for each layer parameter of each noisy layer, generating the noise value for the layer parameter by combining two or more of the sampled random values for the noisy layer; determining, for each layer parameter of each noisy layer, a noisy current value for the layer parameter from: (i) the current value of the layer parameter, (ii) the current value of the corresponding noise parameter, and (iii) the noise value; processing the training input using the neural network in accordance with the noisy current values to generate a network output for the training input; determining a gradient of an objective function that depends on the network output with respect to the current values of the layer parameters and the current values of the noise parameters; and determining an update to the current values of the layer parameters and the current values of the noise parameters from the gradient. 2. The method of claim 1 , wherein determining, for each layer parameter of each noisy layer, a noisy current value for the layer parameter from: (i) the current value of the layer parameter, (ii) the current value of the corresponding noise parameter, and (iii) the noise value comprises: generating a noise modifier based on the current value of the corresponding noise parameter and the noise value; and applying the noise modifier to the current value of the layer parameter to determine the noisy current value for the layer parameter. 3. The method of claim 2 , wherein applying the noise modifier to the current value of the layer parameter to determine the noisy current value for the layer parameter comprises: adding the noise modifier and the current value of the layer parameter to generate the noisy current value for the layer parameter. 4. The method of claim 1 , further comprising selecting an action to be performed by a reinforcement learning agent interacting with an environment based on the network output. 5. The method of claim 4 , wherein the network input comprises an observation characterizing a state of the environment and an action from a set of actions, and wherein the network output is an estimate of a return received if the reinforcement learning agent performs the action in response to the observation. 6. The method of claim 4 , wherein the network input comprises an observation characterizing a state of the environment and the network output defines a likelihood distribution over actions in a set of possible actions to be performed by the agent in response to the observation. 7. The method of claim 1 , wherein the plurality of layers also includes one or more layers that are not noisy layers, and wherein processing the training input using the neural network comprises processing the training input in accordance with the noisy current values and the current values of the layer parameters of the layers that are not noise layers to generate the network output for the training input. 8. The method of claim 1 , wherein the one or more noisy layers include one or more fully-connected layers. 9. The method of claim 1 , wherein the one or more noisy layers include one or more convolutional layers. 10. The method of claim 1 , wherein the one or more noisy layers include one or more recurrent neural network layers. 11. The method of claim 1 , wherein the objective function additionally depends on a target output for the training network input. 12. A system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for training a neural network, wherein the neural network is configured to receive a network input and to process the network input to generate a network output, wherein the neural network comprises a plurality of layers each having respective layer parameters, wherein one or more of the layers are noisy layers, and wherein the operations comprise: maintaining data specifying: (i) current values of each layer parameter of each of the plurality of layers and, (ii) for each layer parameter of each noisy layer, a current value of a corresponding noise parameter for the layer parameter in addition to the current value of the layer parameter; obtaining a training network input; determining, for each layer parameter of each noisy layer, a respective noise value based at least in part on a respective random value that is sampled in accordance with a probability distribution, comprising: for each noisy layer, sampling a predetermined number of random values from one or more predetermined probability distributions, wherein the predetermined number of random values is less than a number of respective layer parameters for the noisy layer; and for each layer parameter of each noisy layer, generating the noise value for the layer parameter by combining two or more of the sampled random values for the noisy layer; determining, for each layer parameter of each noisy layer, a noisy current value for the layer parameter from: (i) the current value of the layer parameter, (ii) the current value of the corresponding noise parameter, and (iii) the noise value; processing the training input using the neural network in accordance with the noisy current values to generate a network output for the training input; determining a gradient of an objective function that depends on the network output with respect to the current values of the layer parameters and the current values of the noise parameters; and determining an update to the current values of the layer parameters and the current values of the noise parameters from the gradient. 13. One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for training a neural network, wherein the neural network is configured to receive a network input and to process the network input to generate a network output, wherein the neural network comprises a plurality of layers each having respective layer parameters, wherein one or more of the layers are noisy layers, and wherein the operations comprise: maintaining data specifying: (i) current values of each layer parameter of each of the plurality of layers and, (ii) for each layer parameter of each noisy layer, a current value of a corresponding noise parameter for t

Assignees

Deepmind Tech Ltd

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/092
Reinforcement learning · CPC title
G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

View patent family 62196615

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11977983B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting an action to be performed by a reinforcement learning agent. The method includes obtaining an observation characterizing a current state of an environment. For each layer parameter of each noisy layer of a neural network, a respective noise value is determined. For each layer paramet…
Who is the assignee on this patent?: Deepmind Tech Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 07 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).