Systems and Methods for Navigating Aerial Vehicles Using Deep Reinforcement Learning

US2021123741A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021123741-A1
Application numberUS-201916667424-A
CountryUS
Kind codeA1
Filing dateOct 29, 2019
Priority dateOct 29, 2019
Publication dateApr 29, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology relates to navigating aerial vehicles using deep reinforcement learning techniques to generate flight policies. A computing system may include a simulator configured to produce simulations of a flight of the aerial vehicle in a region of an atmosphere, a replay buffer configured to store frames of the simulations, and a learning module having a deep reinforcement learning architecture configured to, by a reinforcement learning algorithm, process an input of a set of frames, and output a neural network encoding a learned flight policy. A meta-learning system may include stacks of learning systems, a coordinator configured to provide an instruction to the learning systems that includes a parameter and a start time, and an evaluation server configured to evaluate resulting rewards from learned flight policies generated by the learning systems.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computing system for generating a learned flight policy for an aerial vehicle, the system comprising: one or more computers and one or more storage devices, the one or more storage devices storing instructions that when executed cause the one or more computers to implement: a simulator configured to produce a plurality of simulations of a flight of the aerial vehicle in a region of an atmosphere; a replay buffer configured to store a plurality of frames of the plurality of simulations; a learning module comprising a deep reinforcement learning architecture configured to, by a reinforcement learning algorithm, process an input comprising a set of frames, and output a neural network encoding a learned flight policy. 2 . The system of claim 1 , wherein a reward function is defined in the deep reinforcement learning architecture, the reward function being used by the learning module to score the learned flight policy. 3 . The system of claim 1 , wherein the reward function is defined in order to achieve an objective relating to navigation of the aerial vehicle. 4 . The system of claim 1 , wherein the reward function is defined in order to achieve an objective relating to an operation of the aerial vehicle. 5 . The system of claim 1 , further comprising a flight policy server configured to store the neural network encoding the learned flight policy. 6 . The system of claim 1 , further comprising an operation-ready policies server configured to store the neural network encoding the learned flight policy when the learned flight policy meets or exceeds an operation-ready or equivalent threshold score. 7 . The system of claim 1 , wherein each simulation of the plurality of simulations comprises a plurality of frames, and each frame of the plurality of frames represents a time step of a corresponding simulation. 8 . The system of claim 7 , wherein each frame comprises a feature vector representing a plurality of features of the corresponding simulation. 9 . The system of claim 8 , wherein a subset of the plurality of features is an operational feature of the corresponding simulation. 10 . The system of claim 8 , wherein a subset of the plurality of features is an environmental feature of the corresponding simulation. 11 . The system of claim 1 , wherein the aerial vehicle comprises a high altitude lighter than air vehicle. 12 . The system of claim 1 , wherein the aerial vehicle comprises a high altitude fixed-wing vehicle. 13 . The system of claim 1 , wherein the replay buffer further is configured to provide an arbitrary subset of the plurality of frames to be randomly sampled by the learning module. 14 . A meta-learning system for training a plurality of neural networks encoding a plurality of learned flight policies, the system comprising: a plurality of learning systems, each learning system comprising: a simulation module comprising a plurality of simulators configured to run flight simulations, a replay buffer configured to store a plurality of frames of the plurality of simulations and to provide an arbitrary subset of the plurality of frames to be randomly sampled, and a learning module comprising a deep reinforcement learning architecture, the learning module configured to process an input comprising a set of frames, and output a neural network encoding a learned flight policy, wherein the neural network is one of the plurality of neural networks encoding the plurality of learned flight policies; a coordinator configured to provide an instruction to each of the plurality of learning systems comprising a parameter and a start time; and an evaluation server configured to evaluate a resulting reward from a learned flight policy generated by one of the plurality of learning systems. 15 . The meta-learning system of claim 14 , wherein the learning module is further configured to score the learned flight policy by a reward function corresponding to an objective of the learning system, one of the set of parameters being the objective. 16 . The meta-learning system of claim 15 , further comprising an operation-ready policies server configured to store the neural network encoding the learned flight policy when a reward score for the learned flight policy meets or exceeds an operation-ready or equivalent threshold. 17 . A computer-implemented method for training a flight policy for an aerial vehicle, the method comprising: simulating, by a simulation module comprising one or more simulators, an aerial vehicle's flight through a region of the atmosphere according to a flight policy; generating a plurality of frames, each frame representing a time step of a simulation produced by the one or more simulators; storing the plurality of frames, each frame including a feature vector characterizing a plurality of features and representing a given situation at the time step of the simulation; requesting, by a learning module implementing a reinforcement learning algorithm, a set of frames from the replay buffer; processing, by the learning module, the set of frames using the reinforcement learning algorithm; generating, by the learning module, a neural network encoding a learned flight policy; and storing the learned flight policy in a policy server, the flight policy being encoded in a neural network. 18 . The method of claim 17 , further comprising evaluating the neural network encoding the learned flight policy according to a threshold. 19 . The method of claim 17 , wherein the set of frames is provided to the learning module in a random order. 20 . The method of claim 17 , wherein the learned flight policy is configured to output an action for the aerial vehicle in the given situation. 21 . The method of claim 17 , wherein the learned flight policy is configured to output a representation of an action for the aerial vehicle in the given situation. 22 . The method of claim 17 , wherein the learned flight policy is configured to output a command for the aerial vehicle in the given situation. 23 . The method of claim 17 , wherein at least one of the plurality of features is an operational feature of the simulation. 24 . The method of claim 17 , wherein at least one of the plurality of features is an environmental feature of the simulation.

Assignees

Inventors

Classifications

  • Knowledge-based neural networks; Logical representations of neural networks · CPC title

  • autonomous, i.e. by navigating independently from ground or air stations, e.g. by using inertial navigation systems [INS] · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Reinforcement learning · CPC title

  • generated by photovoltaics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021123741A1 cover?
The technology relates to navigating aerial vehicles using deep reinforcement learning techniques to generate flight policies. A computing system may include a simulator configured to produce simulations of a flight of the aerial vehicle in a region of an atmosphere, a replay buffer configured to store frames of the simulations, and a learning module having a deep reinforcement learning archite…
Who is the assignee on this patent?
Loon Llc
What technology area does this patent fall under?
Primary CPC classification G01C21/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 29 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).