What technology area does this patent fall under?

Primary CPC classification G05D1/101. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Autonomous behavior generation for aircraft

US11150670B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11150670-B2
Application number	US-201916423892-A
Country	US
Kind code	B2
Filing date	May 28, 2019
Priority date	May 28, 2019
Publication date	Oct 19, 2021
Grant date	Oct 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Apparatus and methods for training a machine learning algorithm (MLA) to control a first aircraft in an environment that comprises the first aircraft and a second aircraft are described. Training of the MLA can include: the MLA determining a first-aircraft action for the first aircraft to take within the environment; sending the first-aircraft action from the MLA; after sending the first-aircraft action, receiving an observation of the environment and a reward signal at the MLA, the observation including information about the environment after the first aircraft has taken the first-aircraft action and the second aircraft has taken a second-aircraft action, the reward signal indicating a score of performance of the first-aircraft action based on dynamic and kinematic properties of the second aircraft; and updating the MLA based on the observation of the environment and the reward signal.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: training a machine learning algorithm to control a first aircraft in an environment that comprises the first aircraft and a second aircraft by: determining a first-aircraft action for the first aircraft to take within the environment using the machine learning algorithm; sending the first-aircraft action from the machine learning algorithm; after sending the first-aircraft action, receiving an observation of the environment and a reward signal at the machine learning algorithm, wherein the observation of the environment comprises information about the environment after the first aircraft has taken the first-aircraft action and the second aircraft has taken a second-aircraft action, and wherein the reward signal indicates a score of performance of the first-aircraft action by the first aircraft based on one or more dynamic and kinematic properties of the second aircraft within the environment; and updating the machine learning algorithm based on the observation of the environment and the reward signal. 2. The method of claim 1 , wherein receiving the observation of the environment comprises receiving the observation of the environment from a simulator simulating interactions between the first and second aircraft in the environment. 3. The method of claim 2 , wherein receiving the observation of the environment from the simulator comprises receiving the observation of the environment from a simulator that: receives actions from both the first and second aircraft; determines a state of the environment based on the received actions; and determines the information about the environment after the first and second aircraft have taken subsequent actions based on the state of the environment. 4. The method of claim 1 , wherein receiving the observation of the environment comprises receiving an observation of the environment that is based on data obtained from one or more sensors of a non-simulated aircraft. 5. The method of claim 1 , wherein receiving the observation of the environment and the reward signal comprises receiving a reward signal that is based on one or more of: a location of the second aircraft within the environment, a velocity of the second aircraft, an acceleration of the second aircraft, a position of the second aircraft relative to the first aircraft, and a distance between the first and second aircraft. 6. The method of claim 1 , wherein the machine learning algorithm is associated with one or more weights, wherein training the machine learning algorithm to control the first aircraft comprises training the machine learning algorithm in parallel using a plurality of worker threads, each worker thread configured to utilize the machine learning algorithm during training, and wherein updating the machine learning algorithm based on the observation of the environment and the reward signal comprises: storing one or more observations of the environment and one or more reward signals in a trajectory vector using a particular worker thread of the plurality of worker threads; sending the trajectory vector from the particular worker thread to a learner thread associated with the plurality of worker threads; updating the one or more weights of the machine learning algorithm based on the trajectory vector using the learner thread; and updating the machine learning algorithm to utilize the updated one or more weights using the learner thread. 7. The method of claim 6 , wherein storing the one or more observations of the environment and the one or more reward signals in the trajectory vector comprises storing a plurality of observations of the environment and a plurality of reward signals obtained over a plurality of episodes of interactions between the first and second aircraft within the environment in the trajectory vector using the particular worker thread, wherein an episode of interactions between the first and second aircraft within the environment of the plurality of episodes of interactions is associated with a predetermined amount of time. 8. The method of claim 7 , wherein storing the plurality of observations of the environment and the plurality of reward signals obtained over the plurality of episodes of interactions between the first and second aircraft within the environment in the trajectory vector using the particular worker thread comprises storing a plurality of observations of the environment and a plurality of reward signals obtained over an epoch of interactions between the first and second aircraft within the environment, wherein the epoch of interactions between the first and second aircraft within the environment comprises a predetermined number of episodes of interactions between the first and second aircraft within the environment. 9. The method of claim 1 , wherein determining the first-aircraft action for the first aircraft within the environment using the machine learning algorithm comprises: transforming a coordinate-related input to the machine learning algorithm using a coordinate transformation that transforms coordinates into a proper subset of coordinates possible in the coordinate-related input resulting in a transformed coordinated-related input; and providing the transformed coordinated-related input to the machine learning algorithm. 10. The method of claim 1 , wherein receiving the observation of the environment and the reward signal comprises receiving a reward signal that is based on a first reward for reducing distance between the first aircraft and the second aircraft, a second reward for the first aircraft reaching a desired location with respect to the second aircraft, or both the first reward and the second reward. 11. The method of claim 1 , wherein training the machine learning algorithm to control the first aircraft comprises: training the machine learning algorithm to control the first aircraft using a plurality of scenarios related to interactions between the first and second aircraft within the environment, wherein the plurality of scenarios are arranged so that a first scenario precedes a second scenario in the plurality of scenarios, the first scenario involving a first range of options to control the first aircraft, the second scenario involving a second range of options to control the first aircraft, and wherein the second range of options includes more options than the first range of options. 12. The method of claim 1 , further comprising: after training the machine learning algorithm, using the trained machine learning algorithm to control a non-simulated aircraft. 13. The method of claim 12 , wherein using the trained machine learning algorithm to control the non-simulated aircraft comprises using the trained machine learning algorithm to control the non-simulated aircraft using one or more control systems of the non-simulated aircraft. 14. The method of claim 1 , wherein training the machine learning algorithm further comprises: training the machine learning algorithm for a first training session; after training the machine learning algorithm for the first training session, saving the machine learning algorithm as a previous version of the machine learning algorithm; and after saving the previous version of the machine learning algorithm, continuing training of the machine learning algorithm for a second training session, wherein the machine learning algorithm determines actions for the first aircraft to take within the environment during the second training session, and wherein the previous version of the machine learning algorithm determines actions for the second aircraft to take within the environment during the s

Assignees

Boeing Co

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/096
Transfer learning · CPC title
G06N3/092
Reinforcement learning · CPC title
B64D39/00
Refuelling during flight · CPC title

Patent family

Related publications grouped by family.

View patent family 73549926

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11150670B2 cover?: Apparatus and methods for training a machine learning algorithm (MLA) to control a first aircraft in an environment that comprises the first aircraft and a second aircraft are described. Training of the MLA can include: the MLA determining a first-aircraft action for the first aircraft to take within the environment; sending the first-aircraft action from the MLA; after sending the first-aircra…
Who is the assignee on this patent?: Boeing Co
What technology area does this patent fall under?: Primary CPC classification G05D1/101. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Learning from operator data for practical autonomy

Maintaining position relative to an air anomaly

Autonomous Behavior Generation for Aircraft

Controlling an autonomous vehicle using model predictive control

Methods and apparatus for reinforcement learning

Frequently asked questions