Who is the assignee on this patent?

Beijing Sankuai Online Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T7/74. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Visual positioning based on a plurality of image frames

US12347138B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12347138-B2
Application number	US-202017799900-A
Country	US
Kind code	B2
Filing date	Nov 16, 2020
Priority date	Feb 27, 2020
Publication date	Jul 1, 2025
Grant date	Jul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A visual positioning method and apparatus are provided. In some embodiments, the method includes: acquiring a video captured by an image sensor; determining visual positioning information respectively corresponding to a plurality of key image frames in the video; determining a capture pose transformation relationship between each of the plurality of key image frames according to inertial navigation information of the image sensor recorded when taking the video; performing, according to the visual positioning information corresponding to each of the plurality of key image frames, graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the plurality of key image frames as an edge constraint; and determining, according to a result of the graph optimization processing, a visual positioning result of the image sensor when taking the video.

First claim

Opening claim text (preview).

The invention claimed is: 1. A vision positioning method, comprising: acquiring a video captured by an image sensor; determining visual positioning information respectively corresponding to a plurality of key image frames in the video; determining a capture pose transformation relationship between each of the plurality of key image frames according to inertial navigation information of the image sensor recorded when taking the video; performing, according to the visual positioning information corresponding to each of the plurality of key image frames, graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the plurality of key image frames as an edge constraint; and determining, according to a result of the graph optimization processing, a visual positioning result of the image sensor when taking the video. 2. The method according to claim 1 , wherein determining the visual positioning information respectively corresponding to the plurality of key image frames in the video comprises: determining content information of each image frame in the video; selecting at least three key image frames that satisfy a preset condition from the video according to the content information of each image frame; and determining the visual positioning information corresponding to each of the at least three key image frames. 3. The method according to claim 2 , wherein selecting the at least three key image frames that satisfy the preset condition from the video according to the content information of each image frame comprises: determining a selection indicator according to the content information of each image frame, the selection indicator comprising at least one of: a content repeatability between each pair of two image frames, a content richness of each image frame, or image quality of each image frame; and selecting the at least three key image frames from the video according to the selection indicator. 4. The method according to claim 3 , wherein the selection indicator is the content repeatability between each pair of two image frames, and determining the selection indicator according to the content information of each image frame comprises: for each pair of two image frames, comparing the two image frames, and determining an image content overlapping region between the two image frames according to a result of the comparison; and determining the content repeatability of the two image frames according to the image content overlapping region. 5. The method according to claim 3 , wherein the video comprises a first image frame, the selection indicator is the content richness of each image frame, and determining the selection indicator according to the content information of each image frame comprises: determining the content richness of the first image frame according to at least one of: a gradient, a texture, or a quantity of feature points of the first image frame. 6. The method according to claim 3 , wherein the video comprises a second image frame, the selection indicator is the image quality of each image frame, and determining the selection indicator according to the content information of each image frame comprises: determining the image quality of the second image frame according to at least one of: a gradient, a brightness, or a sharpness of the second image frame. 7. The method according to claim 3 , wherein the video comprises a third image frame, and selecting the at least three key image frames that satisfy the preset condition from the video according to the content information of each image frame comprises: selecting the third image frame as one of the at least three key image frames when a content repeatability between the third image frame and other image frames in the video is less than a preset content repeatability threshold, and/or a content richness of the third image frame is greater than a preset content richness threshold, and/or image quality of the third image frame is greater than a preset image quality threshold. 8. The method according to claim 2 , wherein performing, according to the visual positioning information corresponding to each of the plurality of key image frames, the graph optimization processing on the visual positioning information corresponding to each of the plurality of key image frames by using the capture pose transformation relationship between each of the key image frames as the edge constraint comprises: determining, in an electronic map, a local position region in which the image sensor is located according to the capture pose transformation relationship between each of the plurality of key image frames and the visual positioning information corresponding to each of the plurality of key image frames; determining updated visual positioning information of each of the plurality of key image frames relative to the local position region; determining at least one key image frame in the local position region according to the updated visual positioning information of each of the plurality of key image frames, and determining updated visual positioning information of the at least one key image frame in the local position region as to-be-determined visual positioning information; and performing graph optimization processing on the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local position region by using a capture pose transformation relationship between each of the at least one key image frame in the local position region as an edge constraint. 9. The method according to claim 8 , wherein determining, in the electronic map, the local position region in which the image sensor is located according to the capture pose transformation relationship between each of the plurality of key image frames and the visual positioning information corresponding to each of the plurality of key image frames comprises: selecting a key image frame from the at least three key image frames as a reference image frame, and determining remaining key image frames as other key image frames; performing coordinate transformation on visual positioning information corresponding to the other key image frames according to capture pose transformation relationships between the other key image frames and the reference image frame, to obtain relative visual positioning information of each of the other key image frames; clustering the visual positioning information corresponding to the reference image frame and the relative visual positioning information of each of the other key image frames; selecting at least two designated key image frames from the at least three key image frames according to a clustering result; and determining, in the electronic map, the local position region in which the image sensor is located according to visual positioning information corresponding to the selected designated key image frames. 10. The method according to claim 8 , wherein performing the graph optimization processing on the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local position region by using the capture pose transformation relationship between each of the at least one key image frame in the local position region as the edge constraint comprises: determining a positioning error according to the capture pose transformation relationship between each of the at least one key image frame in the local position region and the to-be-determined visual positioning information corresponding to each of the at least one key image frame in the local

Assignees

Beijing Sankuai Online Tech Co Ltd

Inventors

Classifications

G06T2207/30168
Image quality inspection · CPC title
G06T2207/10016
Video; Image sequence · CPC title
G06T7/0002
Inspection of images, e.g. flaw detection · CPC title
G01C21/165
combined with non-inertial navigation instruments · CPC title
G06T7/73
using feature-based methods · CPC title

Patent family

Related publications grouped by family.

View patent family 70865103

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12347138B2 cover?: A visual positioning method and apparatus are provided. In some embodiments, the method includes: acquiring a video captured by an image sensor; determining visual positioning information respectively corresponding to a plurality of key image frames in the video; determining a capture pose transformation relationship between each of the plurality of key image frames according to inertial naviga…
Who is the assignee on this patent?: Beijing Sankuai Online Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T7/74. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).