Position determination device
US-12154350-B2 · Nov 26, 2024 · US
US2025078519A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025078519-A1 |
| Application number | US-202418816542-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 27, 2024 |
| Priority date | Aug 29, 2023 |
| Publication date | Mar 6, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The invention relates to a method for determining a representation of one or more road objects of a road for a vehicle traveling on the road. The method includes, for each time step out of a plurality of consecutive time steps, encoding images output from the one or more cameras of a vehicle using machine-learning algorithms trained to output image features of road objects depicted in an image provided as input to the machine-learning algorithms. The method further includes transforming a plurality of image features included in the encoded images to a Bird's Eye View (BEV) representation of the plurality of image features. The method also includes decoding the BEV representation to extract a set of object embeddings using transformer-based machine-learning algorithms. Further, the method includes outputting a position and class of each road object of the one or more road objects by decoding the extracted set of object embeddings.
Opening claim text (preview).
1 . A computer-implemented method for determining a representation of one or more road objects of a road for a vehicle traveling on the road, the vehicle having one or more cameras, the method comprising: for each time step out of a plurality of consecutive time steps: encoding one or more images output from the one or more cameras using one or more machine-learning algorithms trained to output image features of one or more road objects depicted in an image provided as input to the one or more machine-learning algorithms; transforming a plurality of image features comprised in the one or more encoded images to a Bird's Eye View (BEV) representation of the plurality of image features; decoding the BEV representation in order to extract a set of object embeddings from the BEV representation using one or more transformer-based machine-learning algorithms trained to output the set of object embeddings based on an input comprising the BEV representation, a set of object queries, and a set of transformed prior object embeddings extracted at a preceding time step; outputting a position and class of each road object of the one or more road objects by decoding the extracted set of object embeddings. 2 . The method according to claim 1 , further comprising: for each time step out of the plurality of consecutive time steps: transforming the object embeddings extracted at a previous time step using one or more Multi-Layer Perceptron (MLP) algorithms and motion data of the vehicle. 3 . The method according to claim 1 , further comprising: for each time step out of the plurality of consecutive time steps, encoding a lidar output dataset from one or more lidars of the vehicle onto the BEV representation using a machine-learning algorithm trained to extract a plurality of features of one or more road objects indicated in a lidar output dataset provided as input to the machine-learning algorithm. 4 . The method according to claim 1 , wherein the plurality of image features are transformed to the BEV representation using an Inverse Perspective Mapping algorithm and based on a camera pose of each camera of the one or more cameras. 5 . The method according to claim 1 , wherein the transformer-based machine-learning model is configured to output one object embedding for each object query of the set of object queries, and wherein the object embeddings and object queries are vectors of the same size. 6 . The method according to claim 1 , further comprising: for each time step out of the plurality of consecutive time steps forming a geometric representation of the one or more road objects based on the output position and class of each road object. 7 . The method according to claim 1 , wherein the extracted set of object embeddings are decoded by using one or more Multi-Layer Perceptron (MLP) algorithms configured to output the position and class of each road object based on an input comprising object embeddings. 8 . The method according to claim 1 , further comprising: forming a loss function based on the output position and class of each road object and a ground-truth dataset. 9 . The method according to claim 8 , further comprising: updating the set of object queries based on the formed loss function. 10 . A computer program product comprising instructions which, when executed by a computing device of a vehicle, causes the computing device to carry out the method according to claim 1 . 11 . A non-transitory computer-readable storage medium storing instructions which, when executed by a computing device of a vehicle, causes the computing device to carry out the method according to claim 1 . 12 . A system for determining a representation of one or more road objects of a road for a vehicle traveling on the road, the vehicle having one or more cameras, the system comprising one or more memory storage areas comprising program code, the one or more memory storage areas and the program code being configured to, with the one or more processors, cause the system to at least: for each time step out of a plurality of consecutive time steps: encode one or more images output from the one or more cameras using one or more machine-learning algorithms trained to output image features of one or more road objects depicted in an image provided as input to the one or more machine-learning algorithms; transform a plurality of image features comprised in the one or more encoded images to a Bird's Eye View (BEV) representation of the plurality of image features; decode the BEV representation in order to extract a set of object embeddings from the BEV representation using one or more transformer-based machine-learning algorithms trained to output the set of object embeddings based on an input comprising the BEV representation, a set of object queries, and a set of transformed prior object embeddings extracted at a preceding time step; output a position and class of each road object of the one or more road objects by decoding the extracted set of object embeddings. 13 . A vehicle comprising a system according to claim 12 .
Detecting or categorising vehicles · CPC title
Probabilistic or stochastic CAD · CPC title
Barriers · CPC title
Road markings, e.g. lane marker or crosswalk · CPC title
Radar; Laser, e.g. lidar · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.