Self-aware visual-textual co-grounded navigation agent
US-11029694-B2 · Jun 8, 2021 · US
US12276507B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12276507-B2 |
| Application number | US-202117645449-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 21, 2021 |
| Priority date | Jun 16, 2021 |
| Publication date | Apr 15, 2025 |
| Grant date | Apr 15, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An indoor navigation method is provided, including: receiving an instruction for navigation, and collecting an environment image; extracting an instruction room feature and an instruction object feature carried in the instruction, and determining a visual room feature, a visual object feature, and a view angle feature based on the environment image; fusing the instruction object feature and the visual object feature with a first knowledge graph representing an indoor object association relationship to obtain an object feature, and determining a room feature based on the visual room feature and the instruction room feature; and determining a navigation decision based on the view angle feature, the room feature, and the object feature.
Opening claim text (preview).
The invention claimed is: 1. An indoor navigation method, applied to a navigation equipment, wherein the indoor navigation method comprises: receiving an instruction for navigation, and collecting an environment image; extracting an instruction room feature and an instruction object feature carried in the instruction, and determining a visual room feature, a visual object feature, and a view angle feature based on the environment image, wherein the instruction room feature is configured to indicate room information obtained from the instruction for navigation, the instruction object feature is configured to indicate object information obtained from the instruction for navigation, the visual room feature is configured to indicate room information obtained from the environment image, the visual object feature is configured to indicate object information obtained from the environment image, and the view angle feature is configured to reflect information carried in an view angle of the environment image; fusing the instruction object feature and the visual object feature with a first knowledge graph representing an indoor object association relationship to obtain an object feature, and determining a room feature based on the visual room feature and the instruction room feature; and determining a navigation decision based on the view angle feature, the room feature, and the object feature; wherein fusing the instruction object feature and the visual object feature with the first knowledge graph representing the indoor object association relationship to obtain the object feature comprises: extracting an object entity carried in the environment image based on the visual object feature; constructing a second knowledge graph based on the object entity and the first knowledge graph representing the indoor object association relationship, wherein the second knowledge graph is configured to represent an association relationship between the object entity and a first object entity in the first knowledge graph that has an association relationship with the object entity; performing multi-step graph convolutional reasoning on the first knowledge graph and the second knowledge graph respectively so as to obtain first knowledge graph reasoning information and second knowledge graph reasoning information; fusing the first knowledge graph reasoning information with the second knowledge graph reasoning information, and updating the first knowledge graph by using the fused knowledge graph reasoning information; performing a first feature fusing and reinforcing operation on the instruction object feature based on the second knowledge graph to obtain an enhanced instruction object feature; and performing a second feature fusing and reinforcing operation on the updated first knowledge graph and the enhanced instruction object feature to obtain the object feature; and wherein determining the room feature based on the visual room feature and the instruction room feature comprises: determining a visual room category carried in each of optional view angles based on the visual room feature, and determining an instruction room category carried in each of the optional view angles based on the instruction room feature; determining a room confidence level of each of the optional view angles based on the visual room category, the instruction room category, and a preset room correlation matrix; and determining the room feature based on the room confidence level of each of the optional view angles. 2. The indoor navigation method according to claim 1 , wherein determining the navigation decision based on the view angle feature, the room feature, and the object feature comprises: determining a total view angle feature of the environment image based on the view angle feature; splicing the total view angle feature, the object feature, the instruction room feature, and the instruction object feature to obtain a scenario memory token and obtain current navigation progress information based on the scenario memory token, and splicing the room feature and the view angle feature to obtain an optional view angle feature; and performing a third feature fusing and reinforcing operation on the optional view angle feature and the current navigation progress information to obtain a navigation decision for a next navigation progress. 3. The indoor navigation method according to claim 2 , wherein determining the total view angle feature of the environment image based on the view angle feature comprises: determining a previous navigation progress information; and performing a fourth feature fusing and reinforcing operation on the view angle feature and the previous navigation progress information to obtain the total view angle feature of the environment image. 4. The indoor navigation method according to claim 1 , wherein the indoor navigation method further comprises: determining a penalty coefficient based on a deviation degree between a view angle in the navigation decision and an optimal view angle, wherein the optimal view angle is an optional view angle that is closest to a navigation end point among all the optional view angles; and changing the view angle in the navigation decision based on the penalty coefficient. 5. The indoor navigation method according to claim 4 , wherein determining the navigation decision comprises: determining the navigation decision based on a vision-language navigation model; obtaining the vision-language navigation model by training by determining a total loss function based on an imitation learning loss function, a room category prediction loss function, and a direction perception loss function, the imitation learning loss function is configured to represent a deviation degree between the optional view angle and the optimal view angle, the room category prediction loss function is configured to represent a deviation degree between a room category corresponding to the optional view angle and a room category in the navigation decision, and the direction perception loss function is configured to represent a deviation degree between the view angle in the navigation decision and the optimal view angle; and training the vision-language navigation model based on the total loss function. 6. The indoor navigation method according to claim 1 , wherein the indoor navigation method further comprises: determining a value of logit of each of the optional view angles, and determining a backtracking distance between each of the optional view angles and a current position; and modifying the value of logit of each of the optional view angles based on the backtracking distance, and changing the view angle in the navigation decision based on the modified values of logit. 7. An indoor navigation equipment, comprising: one or more processors; and one or more memories configured to store instructions executable by the processor; wherein the processor is configured to: receive an instruction for navigation, and collect an environment image; extract an instruction room feature and an instruction object feature carried in the instruction, and determine a visual room feature, a visual object feature, and a view angle feature based on the environment image, wherein the instruction room feature is configured to indicate room information obtained from the instruction for navigation, the instruction object feature is configured to indicate object information obtained from the instruction for navigation, the visual room feature is configured to indicate room information obtained from the environment image, the visual object feature is configured to indicate object information obtained from the environment image, and the view angle feature is configured to reflect information carried in an view
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Knowledge-based neural networks; Logical representations of neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.