Learning of operator for planning problem
US-2022309383-A1 · Sep 29, 2022 · US
US11887009B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11887009-B2 |
| Application number | US-202118039271-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 29, 2021 |
| Priority date | Jun 1, 2021 |
| Publication date | Jan 30, 2024 |
| Grant date | Jan 30, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application discloses an automatic driving control method. In the method, parameters are optimally set by using a noisy and noiseless dual-strategy network, identical vehicle traffic environment state information is input into the noisy and noiseless dual-strategy network, a motion space perturbation threshold is set by using a noiseless strategy network as a comparison and a benchmark so as to adaptively adjust noise parameters, and motion noise is indirectly added by adaptively injecting noise into a strategy network parameter space, such that exploration of an environment and a motion space by a deep reinforcement learning algorithm may be effectively improved, automatic driving exploration performance and stability based on deep reinforcement learning is improved, and full consideration of influence of an environment state and driving strategies in vehicle decision-making and motion selection is ensured, thereby improving the stability and safety of an automatic vehicle.
Opening claim text (preview).
The invention claimed is: 1. A method for automatic driving control, comprising: initializing a system parameter of a deep-reinforcement-learning automatic driving decision system, wherein the deep-reinforcement-learning automatic driving decision system comprises a noiseless strategic network and a noisy strategic network; obtaining vehicle traffic environmental state information; inputting the vehicle traffic environmental state information into the noiseless strategic network and the noisy strategic network to perform automatic driving strategy generation, to obtain a noiseless strategy and a noisy strategy; adjusting a noise parameter injected into the noisy strategic network within a disturbance threshold according to the noisy strategy and the noiseless strategy, wherein adjusting the noise parameter injected into the noisy strategic network within the disturbance threshold according to the noisy strategy and the noiseless strategy comprises: calculating strategy difference between the noisy strategy and the noiseless strategy; determining whether the strategy difference exceeds a disturbance threshold; taking a quotient of the strategy difference and a modulation factor as the noise parameter when the strategy difference exceeds the disturbance threshold; and taking a product of the strategy difference and the modulation factor as the noise parameter when the strategy difference does not exceed the disturbance threshold; wherein the modulation factor is greater than 1; performing parameter optimization on a system parameter of the noisy strategic network according to the noise parameter to generate an optimized noisy strategic network; and performing automatic driving control according to a driving strategy generated by the optimized noisy strategy network; wherein performing parameter optimization on the system parameter of the noisy strategic network according to the noise parameter comprises: performing parameter optimization on a system parameter of the noiseless strategic network according to the noisy strategy, and taking a system parameter of the optimized noiseless strategic network as an original parameter; and taking a sum of the original parameter and the noise parameter as an optimized system parameter of the noisy strategic network; wherein the system parameter of the deep-reinforcement-learning automatic driving decision system comprises an initial strategy parameter with no noise, an initial strategy parameter with implicit noise, an initial network parameter and initial strategy parameter noise. 2. The method for automatic driving control according to claim 1 , wherein before performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, the method further comprises: determining execution times of the parameter optimization; determining whether the execution times reach a threshold number of training times; performing the step of performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, when the execution times reach the threshold number of training times; and performing the step of obtaining the vehicle traffic environmental state information when the execution times do not reach the threshold number of training times. 3. The method for automatic driving control according to claim 2 , wherein the method further comprises: performing the step of initializing the system parameter of the deep-reinforcement-learning automatic driving decision system when a notice of driving accident is received. 4. The method for automatic driving control according to claim 1 , wherein the strategic network is a network constructed based on a deep-reinforcement-learning strategy parameter space. 5. The method for automatic driving control according to claim 4 , wherein the deep-reinforcement-learning automatic driving decision system further comprises an evaluation network; and performing parameter optimization on the system parameter of the noisy strategic network according to the noise parameter comprises: update a parameter of the evaluation network, a parameter of the noiseless strategic network and a parameter of the strategic network with implicit noise. 6. A device for automatic driving control, comprising: a memory configured for storing a computer program; and a processor configured for implementing steps of the method for automatic driving control according to claim 1 when the computer program is executed. 7. The device for automatic driving control according to claim 6 , wherein before performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, the method further comprises: determining execution times of the parameter optimization; determining whether the execution times reach a threshold number of training times; performing the step of performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, when the execution times reach the threshold number of training times; and performing the step of obtaining the vehicle traffic environmental state information when the execution times do not reach the threshold number of training times. 8. The device for automatic driving control according to claim 7 , wherein the method further comprises: performing the step of initializing the system parameter of the deep-reinforcement-learning automatic driving decision system when a notice of driving accident is received. 9. The device for automatic driving control according to claim 6 , wherein the noiseless strategic network refers to a strategic network with no noise, and the noisy strategic network refers to a strategic network with implicit noise, and the strategic network is a network constructed based on a deep-reinforcement-learning strategy parameter space. 10. The device for automatic driving control according to claim 9 , wherein the deep-reinforcement-learning automatic driving decision system further comprises an evaluation network; and performing parameter optimization on the system parameter of the noisy strategic network according to the noise parameter comprises: update a parameter of the evaluation network, a parameter of the noiseless strategic network and a parameter of the strategic network with implicit noise. 11. A non-transitory readable storage medium, having a computer program stored thereon and the computer program, when executed by a processor, implementing steps of the method for automatic driving control according to claim 1 . 12. The non-transitory readable storage medium according to claim 11 , wherein before performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, the method further comprises: determining execution times of the parameter optimization; determining whether the execution times reach a threshold number of training times; performing the step of performing the automatic driving control according to the driving strategy generated by the optimized noisy strategy network, when the execution times reach the threshold number of training times; and performing the step of obtaining the vehicle traffic environmental state information when the execution times do not reach the threshold number of training times. 13. The non-transitory readable storage medium according to claim 12 , wherein the method further comprises: performing the step of initializing the system parameter of the deep-reinforcement-learning automatic driving decision system when a notice of driving ac
Reinforcement learning · CPC title
specially adapted for safety · CPC title
Input parameters relating to objects · CPC title
in which a parameter or coefficient is automatically adjusted to optimise the performance · CPC title
using neural networks only · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.