Agent navigation using visual inputs

US11010948B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11010948-B2
Application numberUS-201816485140-A
CountryUS
Kind codeB2
Filing dateFeb 9, 2018
Priority dateFeb 9, 2017
Publication dateMay 18, 2021
Grant dateMay 18, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step, processing the image to generate a first initial characterization for the time step, obtaining a final characterization for a previous time step, processing the characterization for the previous time step to generate a second initial characterization for the time step, and combining the first initial characterization and the second initial characterization to generate a final characterization for the time step.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising one or more computers and one or more storage devices storing instructions that when executed cause the one or more computers to implement: a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map identifying locations in the environment having a particular characteristic, wherein the image of the environment is captured by an agent moving through the environment and is captured from a first-person ego-centric perspective of the agent, wherein the environment map is from a top-down view of the environment, and wherein the particular characteristic is a latent characteristic that is learned during joint training of the mapping subsystem with a planning subsystem that plans actions to be performed by the agent using characterizations generated by the mapping subsystem and wherein generating the characterization comprises, for each time step: obtaining the image of the environment at the time step that is captured from the first-person ego-centric perspective of the agent, processing the image using a neural network to generate a first initial characterization of the environment for the time step, wherein the neural network is configured to receive the image in the first-person ego-centric perspective of the agent and to transform the image in the first-person ego-centric perspective into a first initial characterization of the environment that includes a first environment map that is from the top-down view of the environment, obtaining a final characterization of the environment for a previous time step preceding the time step, wherein the final characterization of the environment for the previous time step includes a final environment map from the previous time step that is from the top-down view of the environment, obtaining a measure of movement of the agent between the previous time step and the time step, processing the characterization of the environment for the previous time step to generate a second initial characterization of the environment for the time step, comprising applying a differentiable warping function to the final characterization of the environment for the previous time step and the measure of movement to generate the second initial characterization, wherein the second initial characterization includes a second environment map that is from the top-down view of the environment; and combining the first initial characterization and the second initial characterization to generate a final characterization of the environment for the time step. 2. The system of claim 1 , wherein the warping function is a function that performs interpolation using bilinear sampling. 3. The system of claim 1 , wherein: the top-down view of the environment is an ego-centric top down perspective that is centered at a current position of the agent. 4. The system of claim 1 , wherein combining the first initial characterization and the second initial characterization to generate the final characterization for the time step comprises: applying an update function to the first initial characterization and the second initial characterization to generate the final characterization. 5. The system of claim 4 , wherein: each characterization includes: (i) a set of scores representing whether or not the plurality of locations in the environment have the particular characteristic, and (ii) a set of measures of confidence in the set of scores, the update function comprises performing operations of the following equations: f t = f t - 1 ⁢ c t - 1 + f t ′ ⁢ c t ′ c t - 1 + c t ′ c t = c t - 1 + c t ′ , wherein f t is the set of scores for the final characterization for the time step, c t is the set of measures of confidence in the set of scores for the final characterization for the time step, f′ t is the set of scores for the first initial characterization, c′ t is the set of measures of confidence in the set of scores for the first initial characterization, ft t-1 is the set of scores for the second initial characterization, and c t-1 is the set of measures of confidence in the set of scores for the second initial characterization. 6. The system of claim 4 , wherein the update function is performed by a recurrent neural network, and wherein the recurrent neural network is configured to, for each time step of the plurality of time steps, process the first initial characterization and the second initial characterization for the time step to generate the final characterization for the time step. 7. The system of claim 1 , wherein the environment map for a time step comprises, for each of the plurality of locations in the environment: a score representing whether or not the location has the particular characteristic, and the characterization further comprises, for each of the locations, a measure of confidence in the score for the location. 8. The system of claim 1 , further comprising: the planning subsystem, wherein the planning system is configured to, for each of the plurality of time steps: obtain the final characterization for the time step from the mapping subsystem, and process the final characterization to select a proposed action to be performed by an agent interacting with the environment at the time step. 9. The system of claim 8 , wherein the agent is performing actions to accomplish a goal, and wherein processing the final characterization to select the proposed action for the time step comprises: generating a sequence of spatially scaled environment maps from the final characterization for the time step, wherein each spatially scaled environment map in the sequence is downsampled relative to any subsequent spatially scaled environment map in the

Assignees

Inventors

Classifications

  • Hierarchical structures, e.g. layering · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • Vehicle exterior; Vicinity of vehicle · CPC title

  • Camera pose · CPC title

  • Video; Image sequence · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11010948B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for navigation using visual inputs. One of the systems includes a mapping subsystem configured to, at each time step of a plurality of time steps, generate a characterization of an environment from an image of the environment at the time step, wherein the characterization comprises an environment map…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G05B13/027. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 18 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).