Autonomous behavior generation for aircraft

US11150670B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11150670-B2
Application numberUS-201916423892-A
CountryUS
Kind codeB2
Filing dateMay 28, 2019
Priority dateMay 28, 2019
Publication dateOct 19, 2021
Grant dateOct 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods for training a machine learning algorithm (MLA) to control a first aircraft in an environment that comprises the first aircraft and a second aircraft are described. Training of the MLA can include: the MLA determining a first-aircraft action for the first aircraft to take within the environment; sending the first-aircraft action from the MLA; after sending the first-aircraft action, receiving an observation of the environment and a reward signal at the MLA, the observation including information about the environment after the first aircraft has taken the first-aircraft action and the second aircraft has taken a second-aircraft action, the reward signal indicating a score of performance of the first-aircraft action based on dynamic and kinematic properties of the second aircraft; and updating the MLA based on the observation of the environment and the reward signal.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: training a machine learning algorithm to control a first aircraft in an environment that comprises the first aircraft and a second aircraft by: determining a first-aircraft action for the first aircraft to take within the environment using the machine learning algorithm; sending the first-aircraft action from the machine learning algorithm; after sending the first-aircraft action, receiving an observation of the environment and a reward signal at the machine learning algorithm, wherein the observation of the environment comprises information about the environment after the first aircraft has taken the first-aircraft action and the second aircraft has taken a second-aircraft action, and wherein the reward signal indicates a score of performance of the first-aircraft action by the first aircraft based on one or more dynamic and kinematic properties of the second aircraft within the environment; and updating the machine learning algorithm based on the observation of the environment and the reward signal. 2. The method of claim 1 , wherein receiving the observation of the environment comprises receiving the observation of the environment from a simulator simulating interactions between the first and second aircraft in the environment. 3. The method of claim 2 , wherein receiving the observation of the environment from the simulator comprises receiving the observation of the environment from a simulator that: receives actions from both the first and second aircraft; determines a state of the environment based on the received actions; and determines the information about the environment after the first and second aircraft have taken subsequent actions based on the state of the environment. 4. The method of claim 1 , wherein receiving the observation of the environment comprises receiving an observation of the environment that is based on data obtained from one or more sensors of a non-simulated aircraft. 5. The method of claim 1 , wherein receiving the observation of the environment and the reward signal comprises receiving a reward signal that is based on one or more of: a location of the second aircraft within the environment, a velocity of the second aircraft, an acceleration of the second aircraft, a position of the second aircraft relative to the first aircraft, and a distance between the first and second aircraft. 6. The method of claim 1 , wherein the machine learning algorithm is associated with one or more weights, wherein training the machine learning algorithm to control the first aircraft comprises training the machine learning algorithm in parallel using a plurality of worker threads, each worker thread configured to utilize the machine learning algorithm during training, and wherein updating the machine learning algorithm based on the observation of the environment and the reward signal comprises: storing one or more observations of the environment and one or more reward signals in a trajectory vector using a particular worker thread of the plurality of worker threads; sending the trajectory vector from the particular worker thread to a learner thread associated with the plurality of worker threads; updating the one or more weights of the machine learning algorithm based on the trajectory vector using the learner thread; and updating the machine learning algorithm to utilize the updated one or more weights using the learner thread. 7. The method of claim 6 , wherein storing the one or more observations of the environment and the one or more reward signals in the trajectory vector comprises storing a plurality of observations of the environment and a plurality of reward signals obtained over a plurality of episodes of interactions between the first and second aircraft within the environment in the trajectory vector using the particular worker thread, wherein an episode of interactions between the first and second aircraft within the environment of the plurality of episodes of interactions is associated with a predetermined amount of time. 8. The method of claim 7 , wherein storing the plurality of observations of the environment and the plurality of reward signals obtained over the plurality of episodes of interactions between the first and second aircraft within the environment in the trajectory vector using the particular worker thread comprises storing a plurality of observations of the environment and a plurality of reward signals obtained over an epoch of interactions between the first and second aircraft within the environment, wherein the epoch of interactions between the first and second aircraft within the environment comprises a predetermined number of episodes of interactions between the first and second aircraft within the environment. 9. The method of claim 1 , wherein determining the first-aircraft action for the first aircraft within the environment using the machine learning algorithm comprises: transforming a coordinate-related input to the machine learning algorithm using a coordinate transformation that transforms coordinates into a proper subset of coordinates possible in the coordinate-related input resulting in a transformed coordinated-related input; and providing the transformed coordinated-related input to the machine learning algorithm. 10. The method of claim 1 , wherein receiving the observation of the environment and the reward signal comprises receiving a reward signal that is based on a first reward for reducing distance between the first aircraft and the second aircraft, a second reward for the first aircraft reaching a desired location with respect to the second aircraft, or both the first reward and the second reward. 11. The method of claim 1 , wherein training the machine learning algorithm to control the first aircraft comprises: training the machine learning algorithm to control the first aircraft using a plurality of scenarios related to interactions between the first and second aircraft within the environment, wherein the plurality of scenarios are arranged so that a first scenario precedes a second scenario in the plurality of scenarios, the first scenario involving a first range of options to control the first aircraft, the second scenario involving a second range of options to control the first aircraft, and wherein the second range of options includes more options than the first range of options. 12. The method of claim 1 , further comprising: after training the machine learning algorithm, using the trained machine learning algorithm to control a non-simulated aircraft. 13. The method of claim 12 , wherein using the trained machine learning algorithm to control the non-simulated aircraft comprises using the trained machine learning algorithm to control the non-simulated aircraft using one or more control systems of the non-simulated aircraft. 14. The method of claim 1 , wherein training the machine learning algorithm further comprises: training the machine learning algorithm for a first training session; after training the machine learning algorithm for the first training session, saving the machine learning algorithm as a previous version of the machine learning algorithm; and after saving the previous version of the machine learning algorithm, continuing training of the machine learning algorithm for a second training session, wherein the machine learning algorithm determines actions for the first aircraft to take within the environment during the second training session, and wherein the previous version of the machine learning algorithm determines actions for the second aircraft to take within the environment during the s

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Transfer learning · CPC title

  • Reinforcement learning · CPC title

  • Refuelling during flight · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11150670B2 cover?
Apparatus and methods for training a machine learning algorithm (MLA) to control a first aircraft in an environment that comprises the first aircraft and a second aircraft are described. Training of the MLA can include: the MLA determining a first-aircraft action for the first aircraft to take within the environment; sending the first-aircraft action from the MLA; after sending the first-aircra…
Who is the assignee on this patent?
Boeing Co
What technology area does this patent fall under?
Primary CPC classification G05D1/101. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).