Reinforcement and Model Learning for Vehicle Operation
US-2020346666-A1 · Nov 5, 2020 · US
US11348455B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11348455-B2 |
| Application number | US-201816327337-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 31, 2018 |
| Priority date | Dec 8, 2017 |
| Publication date | May 31, 2022 |
| Grant date | May 31, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An intersection traffic control method, apparatus and system are provided. The method includes that: a vehicle signal of a first vehicle at an intersection and a vehicle signal of a second vehicle located in a set zone in proximity to the intersection are acquired; the vehicle signal of the first vehicle and the vehicle signal of the second vehicle are input into an instruction learning model trained in advance based on a reinforcement learning principle, and a score of a preset traffic indicator of the first vehicle after executing a respective candidate action instruction is calculated; a reward of the first vehicle when executing the respective candidate action instruction is acquired according to the score of the preset traffic indicator, a candidate action instruction corresponding to a maximum reward is determined as an output result of the instruction learning model, and a next action instruction is determined according to the output result; and navigation of the first vehicle through the intersection is controlled according to the next action instruction.
Opening claim text (preview).
What is claimed is: 1. An intersection traffic control method, comprising: acquiring a vehicle signal of a first vehicle at an intersection and a vehicle signal of a second vehicle located in a set zone in proximity to the intersection; inputting the vehicle signal of the first vehicle and the vehicle signal of the second vehicle into an instruction learning model trained in advance based on a reinforcement learning principle, acquiring an output result of the instruction learning model, and determining a next action instruction of the first vehicle according to the output result; and controlling navigation of the first vehicle through the intersection according to the next action instruction, wherein the instruction learning model calculates, according to the input vehicle signal of the first vehicle and the input vehicle signal of the second vehicle, a score of a preset traffic indicator of the first vehicle after executing a respective candidate action instruction, acquires, according to the score of the preset traffic indicator, a reward of the first vehicle when executing the respective candidate action instruction, and determines a candidate action instruction corresponding to a maximum reward as the output result; the preset traffic indicator comprises a first traffic indicator acquired based on a speed, a second traffic indicator acquired based on a danger zone, and a third traffic indicator acquired based on an acceleration and a steering angle; a score of the first traffic indicator is acquired by means of the following manner: determining the score of the first traffic indicator under the respective candidate action instruction according to an average speed of the first vehicle from entering the intersection to executing the respective candidate action instruction, the first traffic indicator being used to represent efficiency of the first vehicle passing through the intersection; a score of the second traffic indicator is calculated by means of the following manner: determining the score of the second traffic indicator under the respective candidate action instruction according to an area of the danger zone between the first vehicle, when executing the respective candidate action instruction, and the second vehicle, the second traffic indicator being used to represent safety of the first vehicle passing through the intersection, and the danger zone being an overlapping zone of an elliptical zone where the first vehicle is located and an elliptical zone where the second vehicle is located; a score of the third traffic indicator is calculated by means of the following manner: determining the score of the third traffic indicator under the respective candidate action instruction according to an acceleration and a steering wheel angle of the first vehicle and a time span taken by the first vehicle to pass through the intersection when the first vehicle executes the respective candidate action instruction, the third traffic indicator being used to represent stationarity of the first vehicle passing through the intersection; the score of the second traffic indicator is acquired by the following formula: f ( D )=0.25×[θ 1 ×( r 1 ) 2 +θ 2 ×( r 2 ) 2 −( h 1 +h 2 )× d 12 ], where d 12 is a distance between a geometric center of the first vehicle and a geometric center of the second vehicle, r 1 and r 2 are a dynamic radius of the first vehicle in polar coordinates and a dynamic radius of the second vehicle in polar coordinates, respectively, θ 1 and θ 2 are an angle formed between the geometric center of the first vehicle and overlapping intersections of the overlapping zone in the polar coordinates and an angle formed between the geometric center of the second vehicle and the overlapping intersections of the overlapping zone in the polar coordinates, respectively, and h 1 and h 2 are vertical distances from the overlapping intersections of the overlapping zone to d 12 , respectively; and/or, the score of the third traffic indicator is acquired by the following formula: f ( α , θ ) = C 1 × 1 n ∑ i = 1 n ( d α dt ) 2 + C 2 × 1 n ∑ i = 1 n ( d θ dt ) 2 , where C 1 and C 2 are preset weight factors, n is the time span taken by the first vehicle to pass through the intersection, α is the acceleration of the first vehicle when executing the respective candidate action instruction, and θ is the steering wheel angle of the first vehicle when executing the respective candidate action instruction. 2. The intersection traffic control method as claimed in claim 1 , wherein the reward is calculated by means of the following manner: performing weighted summation on the score of the first traff
from the vehicle, e.g. floating car data [FCD] · CPC title
Bus networks · CPC title
Centralised systems, e.g. external to vehicles · CPC title
Machine learning · CPC title
where the received information generates an automatic action on the vehicle control · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.