Method and apparatus for determining pose of image capturing device, and storage medium
US-11270460-B2 · Mar 8, 2022 · US
US12347138B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12347138-B2 |
| Application number | US-202017799900-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2020 |
| Priority date | Feb 27, 2020 |
| Publication date | Jul 1, 2025 |
| Grant date | Jul 1, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A visual positioning method and apparatus are provided. In some embodiments, the method includes: acquiring a video captured by an image sensor; determining visual positioning information respectively corresponding to a plurality of key image frames in the video; determining a capture pose transformation relationship between each of the plurality of key image frames according to inertial navigation information of the image sensor recorded when taking the video; performing, according to the visual positioning information corresponding to each of the plurality of key image frames, graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the plurality of key image frames as an edge constraint; and determining, according to a result of the graph optimization processing, a visual positioning result of the image sensor when taking the video.
Opening claim text (preview).
The invention claimed is: 1. A vision positioning method, comprising: acquiring a video captured by an image sensor; determining visual positioning information respectively corresponding to a plurality of key image frames in the video; determining a capture pose transformation relationship between each of the plurality of key image frames according to inertial navigation information of the image sensor recorded when taking the video; performing, according to the visual positioning information corresponding to each of the plurality of key image frames, graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the plurality of key image frames as an edge constraint; and determining, according to a result of the graph optimization processing, a visual positioning result of the image sensor when taking the video. 2. The method according to claim 1 , wherein determining the visual positioning information respectively corresponding to the plurality of key image frames in the video comprises: determining content information of each image frame in the video; selecting at least three key image frames that satisfy a preset condition from the video according to the content information of each image frame; and determining the visual positioning information corresponding to each of the at least three key image frames. 3. The method according to claim 2 , wherein selecting the at least three key image frames that satisfy the preset condition from the video according to the content information of each image frame comprises: determining a selection indicator according to the content information of each image frame, the selection indicator comprising at least one of: a content repeatability between each pair of two image frames, a content richness of each image frame, or image quality of each image frame; and selecting the at least three key image frames from the video according to the selection indicator. 4. The method according to claim 3 , wherein the selection indicator is the content repeatability between each pair of two image frames, and determining the selection indicator according to the content information of each image frame comprises: for each pair of two image frames, comparing the two image frames, and determining an image content overlapping region between the two image frames according to a result of the comparison; and determining the content repeatability of the two image frames according to the image content overlapping region. 5. The method according to claim 3 , wherein the video comprises a first image frame, the selection indicator is the content richness of each image frame, and determining the selection indicator according to the content information of each image frame comprises: determining the content richness of the first image frame according to at least one of: a gradient, a texture, or a quantity of feature points of the first image frame. 6. The method according to claim 3 , wherein the video comprises a second image frame, the selection indicator is the image quality of each image frame, and determining the selection indicator according to the content information of each image frame comprises: determining the image quality of the second image frame according to at least one of: a gradient, a brightness, or a sharpness of the second image frame. 7. The method according to claim 3 , wherein the video comprises a third image frame, and selecting the at least three key image frames that satisfy the preset condition from the video according to the content information of each image frame comprises: selecting the third image frame as one of the at least three key image frames when a content repeatability between the third image frame and other image frames in the video is less than a preset content repeatability threshold, and/or a content richness of the third image frame is greater than a preset content richness threshold, and/or image quality of the third image frame is greater than a preset image quality threshold. 8. The method according to claim 2 , wherein performing, according to the visual positioning information corresponding to each of the plurality of key image frames, the graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the key image frames as the edge constraint comprises: determining, in an electronic map, a local position region in which the image sensor is located according to the capture pose transformation relationship between each of the plurality of key image frames and the visual positioning information corresponding to each of the plurality of key image frames; determining updated visual positioning information of each of the plurality of key image frames relative to the local position region; determining at least one key image frame in the local position region according to the updated visual positioning information of each of the plurality of key image frames, and determining updated visual positioning information of the at least one key image frame in the local position region as to-be-determined visual positioning information; and performing graph optimization processing on the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local position region by using a capture pose transformation relationship between each of the at least one key image frame in the local position region as an edge constraint. 9. The method according to claim 8 , wherein determining, in the electronic map, the local position region in which the image sensor is located according to the capture pose transformation relationship between each of the plurality of key image frames and the visual positioning information corresponding to each of the plurality of key image frames comprises: selecting a key image frame from the at least three key image frames as a reference image frame, and determining remaining key image frames as other key image frames; performing coordinate transformation on visual positioning information corresponding to the other key image frames according to capture pose transformation relationships between the other key image frames and the reference image frame, to obtain relative visual positioning information of each of the other key image frames; clustering the visual positioning information corresponding to the reference image frame and the relative visual positioning information of each of the other key image frames; selecting at least two designated key image frames from the at least three key image frames according to a clustering result; and determining, in the electronic map, the local position region in which the image sensor is located according to visual positioning information corresponding to the selected designated key image frames. 10. The method according to claim 8 , wherein performing the graph optimization processing on the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local position region by using the capture pose transformation relationship between each of the at least one key image frame in the local position region as the edge constraint comprises: determining a positioning error according to the capture pose transformation relationship between each of the at least one key image frame in the local position region and the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local
Image quality inspection · CPC title
Video; Image sequence · CPC title
Inspection of images, e.g. flaw detection · CPC title
combined with non-inertial navigation instruments · CPC title
using feature-based methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.