Path and speed optimization fallback mechanism for autonomous vehicles
US-2019235516-A1 · Aug 1, 2019 · US
US11467591B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11467591-B2 |
| Application number | US-201916413332-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2019 |
| Priority date | May 15, 2019 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a system uses an actor-critic reinforcement learning model to generate a trajectory for an autonomous driving vehicle (ADV) in an open space. The system perceives an environment surrounding an ADV. The system applies a RL algorithm to an initial state of a planning trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV. The system determines a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state. The system generates a first trajectory from the trajectory states by maximizing the reward predictions to control the ADV autonomously according to the first trajectory.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for operating an autonomous driving vehicle, the method comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV; determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state; generating a first trajectory from the trajectory states by maximizing the reward predictions; applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory; determining that the judgment score is below a predetermined threshold; and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 2. The method of claim 1 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles in the perceived environment. 3. The method of claim 1 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 4. The method of claim 1 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 5. The method of claim 1 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks. 6. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations, the operations comprising: perceiving an environment surrounding an autonomous driving vehicle (ADV); applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV; determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state; generating a first trajectory from the trajectory states by maximizing the reward predictions; applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory; determining that the judgment score is below a predetermined threshold; and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 7. The non-transitory machine-readable medium of claim 6 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment. 8. The non-transitory machine-readable medium of claim 6 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 9. The non-transitory machine-readable medium of claim 6 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 10. The non-transitory machine-readable medium of claim 6 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks. 11. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations including perceiving an environment surrounding an autonomous driving vehicle (ADV), applying a reinforcement learning (RL) algorithm to an initial state of an initially planned trajectory based on the perceived environment to determine a plurality of controls for the ADV to advance to a plurality of trajectory states based on map and vehicle control information for the ADV, determining a reward prediction by the RL algorithm for each of the plurality of controls in view of a target destination state, generating a first trajectory from the trajectory states by maximizing the reward predictions, applying a judgment logic to the first trajectory to determine a judgment score for the first trajectory, determining that the judgment score is below a predetermined threshold, and generating a second trajectory based on an open space optimization model to control the ADV autonomously according to the second trajectory. 12. The system of claim 11 , wherein the judgment score includes scores for whether the first trajectory ends at the target destination state, whether the first trajectory is smooth, and whether the first trajectory avoids one or more obstacles for the perceived environment. 13. The system of claim 11 , wherein the open space optimization model is to generate a trajectory for the ADV without following a reference line or traffic lines. 14. The system of claim 11 , wherein the open space optimization model includes a vehicle dynamic model for the ADV. 15. The system of claim 11 , wherein the RL algorithm is performed by an actor neural network and a critic neural network, and wherein the actor neural network and the critic neural network are deep neural networks.
Combinations of networks · CPC title
Activation functions · CPC title
Reinforcement learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.