Estimating ground truth object keypoint labels for sensor readings
US-2022084228-A1 · Mar 17, 2022 · US
US11748449B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748449-B2 |
| Application number | US-202017105027-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 25, 2020 |
| Priority date | Nov 25, 2020 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides a data processing method, an apparatus, an electronic device and a medium, which relates to the technical fields of autonomous driving, electronic maps, deep learning, image processing, and the like. The method includes: a computing device inputs a reference image and a captured image into a feature extraction model; obtain, a set of reference descriptors based on the first descriptor map; determine a plurality of sets of training descriptors; obtain a predicted pose of the vehicle by inputting the plurality of training poses and a plurality of similarities into a pose prediction model; and train the feature extraction model and the pose prediction model. When applied to a vehicle localization system, the trained feature extraction model and pose prediction model according to some embodiments of the present disclosure can improve accuracy and robustness of vehicle localization, thereby boosting the performance of the vehicle localization system.
Opening claim text (preview).
The invention claimed is: 1. A data processing method, comprising: inputting a reference image and a captured image into a feature extraction model, respectively, to obtain a first descriptor map and a second descriptor map, the captured image being obtained by capturing an external environment from a vehicle when the vehicle is in a real pose, the reference image being obtained by pre-capturing the external environment by a capturing device; obtaining, based on the first descriptor map, a set of reference descriptors corresponding to a set of keypoints in the reference image; determining a plurality of sets of training descriptors corresponding to a set of spatial coordinates when the vehicle is in a plurality of training poses, respectively, the plurality of sets of training descriptors belonging to the second descriptor map, the set of spatial coordinates being determined based on the set of keypoints, the plurality of training poses being obtained by offsetting a known pose based on the real pose; obtaining a predicted pose of the vehicle by inputting the plurality of training poses and a plurality of similarities into a pose prediction model, the plurality of similarities being between the plurality of sets of training descriptors and the set of reference descriptors; and training the feature extraction model and the pose prediction model based on a metric representing a difference between the predicted pose and the real pose, in order to apply the trained feature extraction model and the trained pose prediction model to vehicle localization, wherein one of the following: the pose prediction model provides, based on the plurality of similarities, probabilities that the plurality of training poses are real poses, respectively, and the metric comprises a concentration of distribution of the probabilities; or the pose prediction model generates a plurality of regularized similarities based on the plurality of similarities, and the metric is determined based on the plurality of regularized similarities. 2. The method of claim 1 , wherein the metric comprises a deviation between the predicted pose and the real pose. 3. The method of claim 1 , wherein obtaining the set of reference descriptors comprises: obtaining a set of reference images of the external environment, each of the set of reference images comprising a set of keypoints as well as a set of reference descriptors and a set of spatial coordinates associated with the set of keypoints, the set of spatial coordinates being determined by projecting a laser radar point cloud onto the reference image; selecting, from the set of reference images, the reference image corresponding to the captured image based on the known pose; and obtaining the set of reference descriptors and the set of spatial coordinates stored in association with the set of keypoints in the reference image. 4. The method of claim 1 , wherein determining the plurality of sets of training descriptors comprises: determining a set of projection points of the set of spatial coordinates by projecting the set of spatial coordinates onto the captured image based on a first training pose of the plurality of training poses; determining, for a projection point of the set of projection points, a plurality of points adjacent to the projection point in the captured image; determining a plurality of descriptors of the plurality of points in the second descriptor map; and determining, based on the plurality of descriptors, a descriptor of the projection point to obtain a first training descriptor of a set of training descriptors corresponding to the first training pose among the plurality of sets of training descriptors. 5. The method of claim 1 , further comprising: determining, for a first set of training descriptors among the plurality of sets of training descriptors, a plurality of differences between a plurality of training descriptors in the first set of training descriptors and corresponding reference descriptors in the set of reference descriptors; and determining, based on the plurality of differences, a similarity between the first set of training descriptors and the set of reference descriptors as a first similarity of the plurality of similarities. 6. The method of claim 1 , wherein obtaining the predicted pose comprises: determining, based on the plurality of similarities, probabilities that the plurality of training poses are real poses, respectively, using the pose prediction model; and determining, based on the plurality of training poses and the probabilities, an expected pose of the vehicle as the predicted pose. 7. The method of claim 1 , further comprising: determining the plurality of training poses by taking a horizontal coordinate, a longitudinal coordinate and a yaw angle of the known pose as a center and by offsetting from the center in three dimensions of a horizontal axis, a longitudinal axis and a yaw angle axis, with respective predetermined offset units and within respective predetermined maximum offset ranges. 8. The method of claim 1 , further comprising: selecting, based on a farthest point sampling algorithm, the set of keypoints from a set of points in the reference image. 9. An electronic device, comprising: at least one processor; and at least one memory coupled to the at least one processor and storing instructions executable by the at least one processor, the instructions, when executed by the at least one processor, causing the electronic device to: input a reference image and a captured image into a feature extraction model, respectively, to obtain a first descriptor map and a second descriptor map, the captured image being obtained by capturing an external environment from a vehicle when the vehicle is in a real pose, the reference image being obtained by pre-capturing the external environment by a capturing device; obtain, based on the first descriptor map, a set of reference descriptors corresponding to a set of keypoints in the reference image; determine a plurality of sets of training descriptors corresponding to a set of spatial coordinates when the vehicle is in a plurality of training poses, respectively, the plurality of sets of training descriptors belonging to the second descriptor map, the set of spatial coordinates being determined based on the set of keypoints, the plurality of training poses being obtained by offsetting a known pose based on the real pose; obtain a predicted pose of the vehicle by inputting the plurality of training poses and a plurality of similarities into a pose prediction model, the plurality of similarities being between the plurality of sets of training descriptors and the set of reference descriptors; and train the feature extraction model and the pose prediction model based on a metric representing a difference between the predicted pose and the real pose, in order to apply the trained feature extraction model and the trained pose prediction model to vehicle localization, wherein one of the following: the pose prediction model provides, based on the plurality of similarities, probabilities that the plurality of training poses are real poses, respectively, and the metric comprises a concentration of distribution of the probabilities: or the pose prediction model generates a plurality of regularized similarities based on the plurality of similarities, and the metric is determined based on the plurality of regularized similarities. 10. The electronic device of claim 9 , wherein the metric comprises a deviation between the predicted pose and the real pose. 11. The electronic device of claim 9 , wherein the instructions when executed by the at least one processor cause the electr
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
Matching criteria, e.g. proximity measures · CPC title
based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.