Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle

US10940863B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10940863-B2
Application numberUS-201816177834-A
CountryUS
Kind codeB2
Filing dateNov 1, 2018
Priority dateNov 1, 2018
Publication dateMar 9, 2021
Grant dateMar 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided that employ spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle. An actor-critic network architecture includes an actor network that process image data received from an environment to learn the lane-change policies as a set of hierarchical actions, and a critic network that evaluates the lane-change policies to calculate loss and gradients to predict an action-value function (Q) that is used to drive learning and update parameters of the lane-change policies. The actor-critic network architecture implements a spatial attention module to select relevant regions in the image data that are of importance, and a temporal attention module to learn temporal attention weights to be applied to past frames of image data to indicate relative importance in deciding which lane-change policy to select.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for learning lane-change policies via an actor-critic network architecture, wherein each lane-change policy describes one or more actions selected to be taken by an autonomous vehicle, the method comprising: processing, via an actor network over time, image data received from an environment to learn the lane-change policies as a set of hierarchical actions, wherein the lane-change policies each comprise a high-level action and associated low-level actions, wherein the high-level actions comprise: a left lane-change, lane following, and a right lane-change, and wherein each of the associated low-level actions comprises a steering angle command parameter and an acceleration-brake rate parameter; and predicting action values via an action-value function at a critic network; evaluating, via the critic network, a lane-change policy; calculating, via the critic network, loss and gradients to drive learning and update the critic network; wherein processing via the actor network at each particular time step comprises: processing, at a convolutional neural network (CNN) of the actor network, the image data to generate a feature map that comprises a machine-readable representation of the driving environment that includes features of the environment acquired at the particular time step; processing, at a spatial attention module of the actor network, the feature map to select relevant regions in the image data that are of importance to focus on for computing actions when making lane-changes while driving; learning, at the spatial attention module, importance weights for each of the relevant regions of the image data; applying, at the spatial attention module, the learned importance weights to each of the relevant regions of the image data to add importance to the relevant regions of the image data; generating, at the spatial attention module, a spatial context vector; and processing, at a temporal attention module of the actor network, the spatial context vector to learn temporal attention weights that are applied to past frames of image data to indicate relative importance of the past frames; generating, at the temporal attention module, a combined context vector; and processing, via at least one fully connected layer, the combined context vector to generate the set of hierarchical actions. 2. The method according to claim 1 , wherein processing, via the actor network over time, the image data received from the environment, comprises: processing the image data received from the environment to learn the lane-change policies as the set of hierarchical actions that are represented as a vector of a probability of action choices and a first set of parameters coupled to each discrete hierarchical action, and wherein predicting the action values via the action-value function at the critic network, comprises: predicting action values via the action-value function at the critic network using a second set of parameters, wherein the action-value function is represented as a neural network using the second set of parameters; wherein evaluating, via the critic network, the lane-change policy, comprises: evaluating, via the critic network based on transitions generated by the actor network, the lane-change policy, wherein the transitions comprise the image data, the hierarchical actions, rewards, and next image data generated by the actor network. 3. The method according to claim 2 , wherein the calculating, via the critic network, the loss and the gradients to drive learning and update the critic network, comprises: calculating, via the critic network, loss and gradients to drive learning and update the second set of parameters of the critic network, wherein the calculating, via the critic network, comprises: processing, at the critic network during a back-propagation mode, an obtained mini-batch of transitions comprising the image data, the hierarchical actions, rewards, next image data generated by the actor network; computing, at the critic network, first gradients of the action-value function by differentiating a loss of the critic network with respect to the second set of parameters, wherein the first gradients are gradients of an error in predicting the action-value function with respect to the second set of parameters, wherein the first gradients are to be used for updating for the second set of parameters of the critic network; updating the second set of parameters at the critic network based on the first gradients; computing, at the critic network, second gradients of the action-value function with respect to the hierarchical actions generated by the actor network by differentiating a loss of the critic network with respect to the hierarchical actions taken by the actor network; and further comprising: back-propagating the second gradients to the actor network; processing the second gradients at the actor network along with third gradients generated by the actor network to update the first set of parameters, wherein the third gradients are generated by differentiating a loss of the actor network with respect to the hierarchical actions taken by the actor network. 4. The method according to claim 1 , wherein the spatial attention module comprises: an attention network comprising at least one fully connected layer in which each neuron receives input from all activations of a previous layer; and an activation function coupled to the fully connected layer that coverts values into action probabilities, and wherein a set of region vectors are extracted from the feature map by the CNN, wherein each region vector corresponds to a different feature layer of features extracted from a different image region of the image data by the CNN; and wherein learning, at the spatial attention module, importance weights for each of the relevant regions of the image data, comprises: applying, at the attention network, the set of region vectors along with a previous hidden state vector that was generated by an LSTM network during a past time step, to learn an importance weight for each region vector of the set of region vectors; wherein applying, at the spatial attention module, the learned importance weights to each of the relevant regions of the image data to add importance to the relevant regions of the image data, comprises: applying, at the attention network, the learned importance weights to each region vector of the set of region vectors to add importance to each region vector of the set of region vectors in proportion to importance of that region vector as learned by the attention network, and wherein generating, at the spatial attention module, the spatial context vector, comprises: generating, at the attention network, the spatial context vector that is a lower dimensional weighted version of the set of the region vectors that is represented by a weighted sum of all of the set of the region vectors. 5. The method according to claim 4 , wherein the spatial attention module and the temporal attention module each comprise: a Long Short-Term Memory (LSTM) network of LSTM cells, wherein each LSTM cell processes input data sequentially and keeps a hidden state of that input data through time, and wherein the processing, at the temporal attention module of the actor network, the spatial context vector to learn temporal attention weights to be applied to past frames of image data to indicate relative importance in deciding which lane-change policy to select, comprises: processing, at the LSTM network at each time step, the spatial context vector for that time step and the previous hidden state vector that was generated by the LSTM network during the past time step to generate an LSTM output; learning, at the LSTM network, a temporal attention weight for each LSTM output at each time

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Activation functions · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Reinforcement learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10940863B2 cover?
Systems and methods are provided that employ spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle. An actor-critic network architecture includes an actor network that process image data received from an environment to learn the lane-change policies as a set of hierarchical actions, and a critic network that e…
Who is the assignee on this patent?
Gm Global Tech Operations Llc, Univ Carnegie Mellon
What technology area does this patent fall under?
Primary CPC classification B60W30/18163. Mapped technology areas include Operations & Transport.
When was this patent published?
Publication date Tue Mar 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).