SENTINEL LONG SHORT-TERM MEMORY (Sn-LSTM)
US-2018144248-A1 · May 24, 2018 · US
US11029694B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11029694-B2 |
| Application number | US-201816176955-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 31, 2018 |
| Priority date | Sep 27, 2018 |
| Publication date | Jun 8, 2021 |
| Grant date | Jun 8, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An agent for navigating a mobile automated system is disclosed herein. The navigation agent receives a navigation instruction and visual information for one or more observed images. The navigation agent is provided or equipped with self-awareness, which provides or supports the following abilities: identifying which direction to go or proceed by determining the part of the instruction that corresponds to the observed images (visual grounding), and identifying which part of the instruction has been completed or ongoing and which part is potentially needed for the next action selection (textual grounding). In some embodiments, the navigation agent applies regularization to ensures that the grounded instruction can correctly be used to estimate the progress made towards the navigation goal (progress monitoring).
Opening claim text (preview).
What is claimed is: 1. A computing device comprising: a memory containing machine readable medium storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code to cause the one or more processors to: receive a navigation instruction for instructing a mobile automated system to navigate an environment in which the mobile automated system is located; receive visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generate an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generate a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generate an action for the mobile automated system to perform for navigating the environment. 2. The computing device of claim 1 , wherein the machine executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 3. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 4. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 5. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 6. The computing device of claim 1 , wherein the machine executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability. 7. A method for navigating a mobile automated system, the method comprising: receiving, at one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 8. The method of claim 7 , comprising monitoring progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 9. The method of claim 7 , wherein generating an action comprises: generating an encoder context based on the instruction grounding and the visual grounding; and generating the action for the mobile automated system using the encoder context. 10. The method of claim 7 , comprising performing a natural language processing task on the navigation instruction. 11. The method of claim 7 , wherein generating an action comprises identifying a navigable direction with the highest correlation to the instruction grounding. 12. The method of claim 7 , wherein generating an action comprises: identifying a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generating a respective probability. 13. A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising: receiving, at the one or more processors, a navigation instruction for instructing the mobile automated system to navigate an environment in which the mobile automated system is located; receiving, at the one or more processors, visual information for the environment, the visual information comprising one or more images observed for the environment as the mobile automated system is navigated therethrough; generating, at the one or more processors, an instruction grounding based at least in part on the navigation instruction, the instruction grounding identifying which part of the navigation instruction has been completed by the mobile automated system and which part of the navigation instruction is outstanding; generating, at the one or more processors, a visual grounding based at least in part on the visual information, the visual grounding identifying a direction in which the mobile automated system should proceed; and using the instruction grounding and the visual grounding, generating, at the one or more processors, an action for the mobile automated system to perform for navigating the environment. 14. The non-transitory machine-readable medium of claim 13 , wherein the executable code further causes the one or more processors to monitor progress of navigation of the automated system to ensure that the instruction grounding accurately reflects the navigation progress. 15. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: generate an encoder context based on the instruction grounding and the visual grounding; and generate the action for the mobile automated system using the encoder context. 16. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to perform a natural language processing task on the navigation instruction. 17. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to identify a navigable direction with the highest correlation to the instruction grounding. 18. The non-transitory machine-readable medium of claim 13 , wherein the executable code causes the one or more processors to: identify a plurality of directions in which the mobile automated system can navigate; and for each identified navigable direction, generate a respective probability.
the classifiers operating on different input data, e.g. multi-modal recognition · CPC title
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
using neural networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
using classification, e.g. of video objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.