What technology area does this patent fall under?

Primary CPC classification G05D1/0221. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Self-aware visual-textual co-grounded navigation agent

US11029694B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11029694-B2
Application number	US-201816176955-A
Country	US
Kind code	B2
Filing date	Oct 31, 2018
Priority date	Sep 27, 2018
Publication date	Jun 8, 2021
Grant date	Jun 8, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An agent for navigating a mobile automated system is disclosed herein. The navigation agent receives a navigation instruction and visual information for one or more observed images. The navigation agent is provided or equipped with self-awareness, which provides or supports the following abilities: identifying which direction to go or proceed by determining the part of the instruction that corresponds to the observed images (visual grounding), and identifying which part of the instruction has been completed or ongoing and which part is potentially needed for the next action selection (textual grounding). In some embodiments, the navigation agent applies regularization to ensures that the grounded instruction can correctly be used to estimate the progress made towards the navigation goal (progress monitoring).

First claim

Opening claim text (preview).

What is claimed is: 1. A computing device comprising: a memory containing machine readable medium storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code to cause the one or more processors to: receive a navigation instruction for instructing a mobile automated system to navigate an environment in which the mobile automated system is located; receive visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generate an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generate a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generate an action for the mobile automated system to perform for navigating the environment. 2. The computing device of claim 1 , wherein the machine executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 3. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 4. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 5. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 6. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability. 7. A method for navigating a mobile automated system, the method comprising: receiving, at one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 8. The method of claim 7 , comprising monitoring progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 9. The method of claim 7 , wherein generating an action comprises: generating an encoder context based on the instruction grounding and the visual grounding; and generating the action for the mobile automated system using the encoder context. 10. The method of claim 7 , comprising performing a natural language processing task on the navigation instruction. 11. The method of claim 7 , wherein generating an action comprises identifying a navigable direction with the highest correlation to the instruction grounding. 12. The method of claim 7 , wherein generating an action comprises: identifying a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generating a respective probability. 13. A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising: receiving, at the one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 14. The non-transitory machine-readable medium of claim 13 , wherein the executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 15. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 16. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 17. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 18. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability.

Assignees

Salesforce Com Inc

Inventors

Classifications

G06V10/811
the classifiers operating on different input data, e.g. multi-modal recognition · CPC title
G06V20/10
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
G06V10/82
using neural networks · CPC title
G06V10/454
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title

Patent family

Related publications grouped by family.

View patent family 69947468

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11029694B2 cover?: An agent for navigating a mobile automated system is disclosed herein. The navigation agent receives a navigation instruction and visual information for one or more observed images. The navigation agent is provided or equipped with self-awareness, which provides or supports the following abilities: identifying which direction to go or proceed by determining the part of the instruction that corr…
Who is the assignee on this patent?: Salesforce Com Inc
What technology area does this patent fall under?: Primary CPC classification G05D1/0221. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 08 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).