Self-aware visual-textual co-grounded navigation agent

US11029694B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11029694-B2
Application numberUS-201816176955-A
CountryUS
Kind codeB2
Filing dateOct 31, 2018
Priority dateSep 27, 2018
Publication dateJun 8, 2021
Grant dateJun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An agent for navigating a mobile automated system is disclosed herein. The navigation agent receives a navigation instruction and visual information for one or more observed images. The navigation agent is provided or equipped with self-awareness, which provides or supports the following abilities: identifying which direction to go or proceed by determining the part of the instruction that corresponds to the observed images (visual grounding), and identifying which part of the instruction has been completed or ongoing and which part is potentially needed for the next action selection (textual grounding). In some embodiments, the navigation agent applies regularization to ensures that the grounded instruction can correctly be used to estimate the progress made towards the navigation goal (progress monitoring).

First claim

Opening claim text (preview).

What is claimed is: 1. A computing device comprising: a memory containing machine readable medium storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code to cause the one or more processors to: receive a navigation instruction for instructing a mobile automated system to navigate an environment in which the mobile automated system is located; receive visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generate an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generate a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generate an action for the mobile automated system to perform for navigating the environment. 2. The computing device of claim 1 , wherein the machine executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 3. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 4. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 5. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 6. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability. 7. A method for navigating a mobile automated system, the method comprising: receiving, at one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 8. The method of claim 7 , comprising monitoring progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 9. The method of claim 7 , wherein generating an action comprises: generating an encoder context based on the instruction grounding and the visual grounding; and generating the action for the mobile automated system using the encoder context. 10. The method of claim 7 , comprising performing a natural language processing task on the navigation instruction. 11. The method of claim 7 , wherein generating an action comprises identifying a navigable direction with the highest correlation to the instruction grounding. 12. The method of claim 7 , wherein generating an action comprises: identifying a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generating a respective probability. 13. A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising: receiving, at the one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 14. The non-transitory machine-readable medium of claim 13 , wherein the executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 15. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 16. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 17. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 18. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability.

Assignees

Inventors

Classifications

  • the classifiers operating on different input data, e.g. multi-modal recognition · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11029694B2 cover?
An agent for navigating a mobile automated system is disclosed herein. The navigation agent receives a navigation instruction and visual information for one or more observed images. The navigation agent is provided or equipped with self-awareness, which provides or supports the following abilities: identifying which direction to go or proceed by determining the part of the instruction that corr…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G05D1/0221. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).