Swarm fair deep reinforcement learning

US11416743B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11416743-B2
Application numberUS-201916395187-A
CountryUS
Kind codeB2
Filing dateApr 25, 2019
Priority dateApr 25, 2019
Publication dateAug 16, 2022
Grant dateAug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Fair deep reinforcement learning is provided. A microstate of an environment and reaction of items in a plurality of microstates within the environment are observed after an agent performs an action in the environment. Semi-supervised training is utilized to determine bias weights corresponding to the action for the microstate of the environment and the reaction of the items in the plurality of microstates within the environment. The bias weights from the semi-supervised training are merged with non-bias weights using an artificial neural network. Over time, it is determined where bias is occurring in the semi-supervised training based on merging the bias weights with the non-bias weights in the artificial neural network. A deep reinforcement learning model that decreases reliance on the bias weights is generated based on determined bias to increase fairness.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for providing fair deep reinforcement learning, the computer-implemented method comprising: receiving, by a computer, via a network, multimedia data capturing a robot performing an action to accomplish a task in a physical environment from a first set of sensors located in the physical environment; performing, by the computer, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights; identifying, by the computer, equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action; performing, by the computer, post processing of the weighted degree of bias to decrease the bias of the robot by merging the biased nodes having the bias weights in the biased path with the non-biased nodes having the non-bias weights in the non-biased path of the artificial neural network and limiting the bias weights using the limit function to form merged nodes having decreased bias; relabeling, by the computer, training data of a semi-supervised learning model that was used to previously train the robot to perform the action to accomplish the task by the robot in the physical environment based on the post processing of the weighted degree of bias; retraining, by the computer, the robot to increase performance of the action to accomplish the task by the robot in the physical environment using the relabeled training data; recalculating, by the computer, a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot; and updating, by the computer, a Q-table with the recalculated reward corresponding to the action. 2. The computer-implemented method of claim 1 further comprising: receiving, by the computer, the semi-supervised learning model corresponding to a set of two or more physical environments, wherein the physical environment is one physical environment in the set of two or more physical environments; training, by the computer, the robot to perform the action to accomplish the task in the physical environment of the set of two or more physical environments based on the semi-supervised learning model; and mapping, by the computer, the action to be performed by the robot in the physical environment to the reward using the Q-table. 3. The computer-implemented method of claim 2 further comprising: determining, by the computer, change in state of the physical environment based on the analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment. 4. The computer-implemented method of claim 1 further comprising: receiving, by the computer, the semi-supervised learning model corresponding to a set of physical environments; and training, by the computer, a swarm of robots to perform the action to accomplish the task in one or more other physical environments of the set of physical environments based on the semi-supervised learning model. 5. The computer-implemented method of claim 4 further comprising: receiving, by the computer, via the network, multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments from a second set of sensors; analyzing, by the computer, the multimedia data capturing the swarm of robots performing the action to accomplish the task in the one or more other physical environments using the artificial neural network; and determining, by the computer, change in state of the one or more other physical environments based on analysis of the swarm of robots performing the action to accomplish the task in the one or more other physical environments. 6. The computer-implemented method of claim 4 further comprising: identifying, by the computer, the equal opportunity and disparate impact on protected attributes during performance of the action by the swarm of robots to weight degree of bias based on determined change in state of the one or more other physical environments in response to the performance of the action. 7. The computer-implemented method of claim 1 , wherein the artificial neural network is a convolutional neural network. 8. A computer system for providing fair deep reinforcement learning, the computer system comprising: a bus system; a storage device connected to the bus system, wherein the storage device stores program instructions; and a processor connected to the bus system, wherein the processor executes the program instructions to: receive, via a network, multimedia data capturing a robot performing an action to accomplish a task in a physical environment from a first set of sensors located in the physical environment; perform, using an artificial neural network, an analysis of the multimedia data capturing the robot performing the action to accomplish the task in the physical environment to determine bias of the robot corresponding to a set of items located in the physical environment while the robot performs the action, the artificial neural network including a biased path of biased nodes having bias weights, a non-biased path of non-biased nodes having non-bias weights, and a limit function, wherein the artificial neural network executes in parallel the biased path of the biased nodes having the bias weights and the non-biased path of the non-biased nodes having the non-bias weights; identify equal opportunity and disparate impact on protected attributes from the multimedia data during performance of the action by the robot to weight degree of bias based on a determined change in state of the physical environment in response to the robot performing the action; perform post processing of a weighted degree of bias to decrease the bias of the robot by merging the biased nodes having the bias weights in the biased path with the non-biased nodes having the non-bias weights in the non-biased path of the artificial neural network and limiting the bias weights using the limit function to form merged nodes having decreased bias; relabel training data of a semi-supervised learning model that was used to previously train the robot to perform the action to accomplish the task by the robot in the physical environment based on the post processing of the weighted degree of bias; retrain the robot to increase performance of the action to accomplish the task by the robot in the physical environment using the relabeled training data; recalculate a reward corresponding to the action based on the equal opportunity and disparate impact on the protected attributes during performance of the action by the robot; and update a Q-table with the recalculated reward corresponding to the action. 9. The computer system of claim 8 , wherein the processor further executes the program instructions to: receive the semi-supervised learning model corresponding to a set of two or more physical environments, wherein the physical environme

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Reinforcement learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • using electronic means · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11416743B2 cover?
Fair deep reinforcement learning is provided. A microstate of an environment and reaction of items in a plurality of microstates within the environment are observed after an agent performs an action in the environment. Semi-supervised training is utilized to determine bias weights corresponding to the action for the microstate of the environment and the reaction of the items in the plurality of…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).