Online agent using reinforcement learning to plan an open space trajectory for autonomous vehicles

US11467591B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11467591-B2
Application numberUS-201916413332-A
CountryUS
Kind codeB2
Filing dateMay 15, 2019
Priority dateMay 15, 2019
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a system uses an actor-critic reinforcement learning model to generate a trajectory for an autonomous driving vehicle (ADV) in an open space. The system perceives an environment surrounding an ADV. The system applies a RL algorithm to an initial state of a planning trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV. The system determines a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state. The system generates a first trajectory from the trajectory states by maximizing the reward predictions to control the ADV autonomously according to the first trajectory.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for operating an autonomous driving vehicle, the method comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV; determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state; generating a first trajectory from the trajectory states by maximizing the reward predictions; applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory; determining that the judgment score is below a predetermined threshold; and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 2. The method of claim 1 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles in the perceived environment. 3. The method of claim 1 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 4. The method of claim 1 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 5. The method of claim 1 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks. 6. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations, the operations comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV; determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state; generating a first trajectory from the trajectory states by maximizing the reward predictions; applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory; determining that the judgment score is below a predetermined threshold; and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 7. The non-transitory machine-readable medium of claim 6 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment. 8. The non-transitory machine-readable medium of claim 6 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 9. The non-transitory machine-readable medium of claim 6 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 10. The non-transitory machine-readable medium of claim 6 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks. 11. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations including perceiving an environment surrounding an autonomous driving vehicle (ADV), applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV, determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state, generating a first trajectory from the trajectory states by maximizing the reward predictions, applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory, determining that the judgment score is below a predetermined threshold, and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 12. The system of claim 11 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment. 13. The system of claim 11 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 14. The system of claim 11 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 15. The system of claim 11 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Activation functions · CPC title

  • Reinforcement learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Non-supervised learning, e.g. competitive learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11467591B2 cover?
In one embodiment, a system uses an actor-critic reinforcement learning model to generate a trajectory for an autonomous driving vehicle (ADV) in an open space. The system perceives an environment surrounding an ADV. The system applies a RL algorithm to an initial state of a planning trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a p…
Who is the assignee on this patent?
Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/006. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).