What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Jul 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A9). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method for training aircraft control agent

US2024232611A9 · US · A9

Patent metadata
Field	Value
Publication number	US-2024232611-A9
Application number	US-202218049479-A
Country	US
Kind code	A9
Filing date	Oct 25, 2022
Priority date	Oct 25, 2022
Publication date	Jul 11, 2024
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example includes a method for training an agent to control an aircraft. The method includes: selecting, by the agent, first actions for the aircraft to perform within a first environment respectively during first time intervals based on first states of the first environment during the first time intervals, updating the agent based on first rewards that correspond respectively to the first states, selecting, by the agent, second actions for the aircraft to perform within a second environment respectively during second time intervals based on second states of the second environment during the second time intervals, and updating the agent based on second rewards that correspond respectively to the second states. At least one first rule of the first environment is different from at least one rule of the second environment.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training an agent to control an aircraft, the method comprising: selecting, by the agent, first actions for the aircraft to perform within a first environment respectively during first time intervals based on first states of the first environment during the first time intervals; updating the agent based on first rewards that correspond respectively to the first states, wherein the first rewards are based on first rules of the first environment; selecting, by the agent, second actions for the aircraft to perform within a second environment respectively during second time intervals based on second states of the second environment during the second time intervals; and updating the agent based on second rewards that correspond respectively to the second states, wherein the second rewards are based on second rules of the second environment, and wherein at least one first rule of the first rules is different from at least one rule of the second rules. 2 . The method of claim 1 , wherein: updating the agent based on the first rewards comprises updating the agent to change first probabilities that the agent selects the first actions in response to observing the first states, and updating the agent based on the second rewards comprises updating the agent to change second probabilities that the agent selects the second actions in response to observing the second states. 3 . The method of claim 1 , wherein the first rules of the first environment do not allow the aircraft to be destroyed except for when an altitude of the aircraft becomes less than or equal to a threshold altitude, and wherein updating the agent based on the second rewards comprises updating the agent based on the second rewards after updating the agent based on the first rewards. 4 . The method of claim 3 , further comprising determining the first rewards based on whether the altitude of the aircraft is less than or equal to the threshold altitude during each of the first time intervals or whether a training session within the first environment has expired. 5 . The method of claim 4 , wherein determining the first rewards comprises determining the first rewards such that a portion of the first rewards is proportional to a number of time intervals remaining in a training session within the first environment when the altitude of the aircraft became less than or equal to the threshold altitude. 6 . The method of claim 4 , wherein the threshold altitude is a first threshold altitude, and wherein determining the first rewards comprises determining the first rewards additionally based on a degree to which, during each of the first time intervals, the altitude of the aircraft is less than a second threshold altitude that is greater than the first threshold altitude. 7 . The method of claim 4 , wherein the aircraft is a first aircraft, the method further comprising determining the first rewards based on a degree to which, during each of the first time intervals, a position, an orientation, a velocity, or the altitude of the first aircraft improved with respect to the first aircraft following a second aircraft within the first environment. 8 . The method of claim 3 , wherein the aircraft is a first aircraft, and wherein the second rules of the second environment allow the first aircraft to be destroyed when the altitude of the first aircraft becomes less than or equal to the threshold altitude or when a projectile deployed by a second aircraft intercepts the first aircraft. 9 . The method of claim 8 , further comprising determining the second rewards based on whether the altitude of the first aircraft is less than or equal to the threshold altitude during each of the second time intervals or whether a training session within the second environment has expired. 10 . The method of claim 9 , wherein determining the second rewards comprises determining the second rewards such that a portion of the second rewards is proportional to a number of time intervals remaining in a training session within the second environment when the altitude of the first aircraft became less than or equal to the threshold altitude. 11 . The method of claim 9 , wherein the threshold altitude is a first threshold altitude, and wherein determining the second rewards comprises determining the second rewards additionally based on a degree to which, during each of the second time intervals, the altitude of the first aircraft is less than a second threshold altitude that is greater than the first threshold altitude. 12 . The method of claim 8 , further comprising determining the second rewards based on a degree to which, during each of the second time intervals, a position, an orientation, a velocity, or the altitude of the first aircraft improved with respect to the first aircraft following the second aircraft within the second environment. 13 . The method of claim 8 , further comprising determining the second rewards based on whether the first aircraft is destroyed by a projectile deployed by the second aircraft during each of the second time intervals. 14 . The method of claim 13 , wherein determining the second rewards comprises determining the second rewards such that a portion of the second rewards is proportional to a number of time intervals remaining in a training session within the second environment when the first aircraft is destroyed by the projectile deployed by the second aircraft. 15 . The method of claim 8 , wherein the second rules of the second environment include initial conditions placing the first aircraft and the second aircraft at random positions and headings. 16 . The method of claim 8 , wherein the second rules of the second environment include initial conditions placing the first aircraft such that a first angle formed by a first heading of the first aircraft and the second aircraft is smaller than a second angle formed by a second heading of the second aircraft and the first aircraft. 17 . The method of claim 8 , wherein the second rules of the second environment include initial conditions placing the first aircraft such that a first angle formed by a first heading of the first aircraft and the second aircraft is equal to a second angle formed by a second heading of the second aircraft and the first aircraft. 18 . The method of claim 1 , further comprising using the agent to control a non-simulated aircraft. 19 . A non-transitory computer readable medium storing instructions that, when executed by a computing device, cause the computing device to perform functions for training an agent to control an aircraft, the functions comprising: selecting, by the agent, first actions for the aircraft to perform within a first environment respectively during first time intervals based on first states of the first environment during the first time intervals; updating the agent based on first rewards that correspond respectively to the first states, wherein the first rewards are based on first rules of the first environment; selecting, by the agent, second actions for the aircraft to perform within a second environment respectively during second time intervals based on second states of the second environment during the second time intervals; and updating the agent based on second rewards that correspond respectively to the second states, wherein the second rewards are based on second rules of the second environment, and wherein at least one first rule of the first rules is different from at least one rule of the second rules.

Assignees

Boeing Co

Inventors

Classifications

G05D1/689
Pointing payloads towards fixed or moving targets (positioning towed, pushed or suspended implements G05D1/672) · CPC title
G05D1/619
Minimising the exposure of a vehicle to threats, e.g. avoiding interceptors · CPC title
G05D2107/34
Battlefields · CPC title
G05D2105/35
for combat · CPC title
G05D2101/15
using machine learning, e.g. neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 88017681

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024232611A9 cover?: An example includes a method for training an agent to control an aircraft. The method includes: selecting, by the agent, first actions for the aircraft to perform within a first environment respectively during first time intervals based on first states of the first environment during the first time intervals, updating the agent based on first rewards that correspond respectively to the first st…
Who is the assignee on this patent?: Boeing Co
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Jul 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A9). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).