Neural network for reinforcement learning

US9349092B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9349092-B2
Application numberUS-201414293928-A
CountryUS
Kind codeB2
Filing dateJun 2, 2014
Priority dateDec 3, 2012
Publication dateMay 24, 2016
Grant dateMay 24, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural model for reinforcement-learning and for action-selection includes a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, and a population of reward neurons in each of the channels. Each channel of a population of reward neurons receives input from an environmental input, and is coupled only to output neurons in a channel that the reward neuron is part of. If the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced, otherwise the corresponding channel of a population of output neurons are punished and have their responses attenuated.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network for reinforcement-learning and for action-selection comprising: a plurality of channels; a population of input neurons in each of the channels; a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels by first synapses; and a population of reward neurons in each of the channels, wherein each population of reward neurons receives input from an environmental input, and wherein each channel of reward neurons is coupled only to output neurons in a channel that the reward neuron is part of by second synapses; wherein if the environmental input for a channel is positive, the corresponding channel of a population of output neurons are rewarded and have their responses reinforced; wherein if the environmental input for a channel is negative, the corresponding channel of a population of output neurons are punished and have their responses attenuated; and wherein the neural network comprises memristors. 2. The neural network of claim 1 wherein the first synapses and the second synapses have a spike-timing dependent plasticity wherein g syn =g max ·g eff ·( V−E syn ) where gmax is a maximum conductance of the first and second synapses, geff is a current synaptic efficacy between 0 and a maximum value of geffmax, Esyn is a reversal potential for the first and second synapses, V is a voltage, and gsyn is a synapse conductance. 3. The neural network of claim 2 wherein g eff →g eff +g effmax F (Δ t ) where Δ ⁢ ⁢ t = t pre - t post F ⁡ ( Δ ⁢ ⁢ t ) = { A + ⁢ ⅇ ( Δ ⁢ ⁢ t τ + ) A - ⁢ ⅇ ( Δ ⁢ ⁢ t τ - ) ⁢ ⁢ if ⁢ ( g eff < 0 ) ⁢ ⁢ then ⁢ ⁢ g eff → 0 ⁢ ⁢ if ⁢ ( geff > geffmax ) ⁢ ⁢ then ⁢ ⁢ geff → geffmax . 4. The neural network of claim 1 wherein each population of input neurons, each population of output neurons, and each population of reward neurons comprise a Leaky-Integrate and Fire (LIF) device wherein C m ⁢ ⅆ V ⅆ t = - g leak ⁡ ( V

Assignees

Inventors

Classifications

  • G06N3/092Primary

    Reinforcement learning · CPC title

  • Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title

  • Machine learning · CPC title

  • Feedforward networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9349092B2 cover?
A neural model for reinforcement-learning and for action-selection includes a plurality of channels, a population of input neurons in each of the channels, a population of output neurons in each of the channels, each population of input neurons in each of the channels coupled to each population of output neurons in each of the channels, and a population of reward neurons in each of the channels…
Who is the assignee on this patent?
Hrl Lab Llc, Hrl Lab Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/092. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 24 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).