System and method for machine learning fairness testing

US2022114399A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022114399-A1
Application numberUS-202117497755-A
CountryUS
Kind codeA1
Filing dateOct 8, 2021
Priority dateOct 8, 2020
Publication dateApr 14, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for diagnosing and testing fairness of machine learning models based on detecting individual violations of group definitions of fairness, via adversarial attacks that aim to perturb model inputs to generate individual violations. The systems and methods employ auxiliary machine learning models using a local surrogate for identifying group membership and assess fairness by measuring the transferability of attacks from this model. The systems and methods generate fairness indicator values indicative of discrimination risk due to the target predictions generated by the machine learning model, by comparing gradients of the machine learning model to gradients of an auxiliary machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for estimating fairness of a machine learning model, the system comprising: one or more processors operating in conjunction with computer memory, the one or more processors configured to: receive data representative of a value of an input variable of the machine learning model, the machine learning model configured to generate target predictions based on the input variable; generate a first vector indicative of a gradient of the machine learning model evaluated at the value of the input variable; generate a second vector using an auxiliary machine learning model configured to generate predictions indicative of one or more protected attributes based on the input variable, the second vector indicative of a gradient of the auxiliary machine learning model evaluated at the value of the input variable; compare the first vector to the second vector to generate a fairness indicator value; and generate output data representative of the fairness indicator value; wherein the fairness indicator value is indicative of discrimination risk in the target predictions generated by the machine learning model. 2 . The system of claim 1 , wherein comparing the first vector to the second vector includes using a projection of the first vector on to the second vector. 3 . The system of claim 1 , wherein comparing the first vector to the second vector includes using a norm of a projection value, the projection value obtained by projecting the first vector on to the second vector and dividing by a L2 norm of the second vector. 4 . The system of claim 1 , wherein the machine learning model is a first supervised learning model, and the auxiliary machine learning model is a second supervised learning model trained at least partially based on known values of the one or more protected attributes. 5 . The system of claim 1 , wherein the second vector is representative of an output of a sign function of the gradient of the auxiliary machine learning model evaluated at the value of the input variable. 6 . The system of claim 1 , wherein the second vector is indicative of a modified gradient when the gradient of the auxiliary machine learning model is associated with out-of-distribution predictions of the auxiliary machine learning model. 7 . The system of claim 1 , wherein a plurality of fairness indicator values are generated, each corresponding to different protected attributes of the one or more attributes, and the output data is indicative of whether an aggregated measure of the plurality of fairness indicators exceeds a predefined fairness threshold. 8 . The system of claim 7 , wherein each of the plurality of fairness indicator values is indicative of a covariance between the gradient of the machine learning model and the gradient of the auxiliary machine learning model. 9 . The system of claim 7 , wherein the aggregated measure is an L-p norm. 10 . The system of claim 1 , wherein the input variable includes an observable attribute correlated with at least one of the one or more protected attributes via an unobserved latent variable. 11 . A method for estimating fairness of a machine learning model, the method comprising: receiving data representative of a value of an input variable of the machine learning model, the machine learning model configured to generate target predictions based on the input variable; generating a first vector indicative of a gradient of the machine learning model evaluated at the value of the input variable; generating a second vector using an auxiliary machine learning model configured to generate predictions indicative of one or more protected attributes based on the input variable, the second vector indicative of a gradient of the auxiliary machine learning model evaluated at the value of the input variable; comparing the first vector to the second vector to generate a fairness indicator value; and generating output data representative of the fairness indicator value, wherein the fairness indicator value is indicative of discrimination risk in the target predictions generated by the machine learning model. 12 . The method of claim 11 , wherein comparing the first vector to the second vector includes using a projection of the first vector on to the second vector. 13 . The method of claim 11 , wherein comparing the first vector to the second vector includes using a norm of a projection value, the projection value obtained by projecting the first vector on to the second vector and dividing by a L2 norm of the second vector. 14 . The method of claim 11 , wherein the machine learning model is a first supervised learning model, and the auxiliary machine learning model is a second supervised learning model trained at least partially based on known values of the one or more protected attributes. 15 . The method of claim 11 , wherein the second vector is representative of an output of a sign function of the gradient of the auxiliary machine learning model evaluated at the value of the input variable. 16 . The method of claim 11 , wherein the second vector is indicative of a modified gradient when the gradient of the auxiliary machine learning model is associated with out-of-distribution predictions of the auxiliary machine learning model. 17 . The method of claim 11 , wherein a plurality of fairness indicator values are generated, each corresponding to different protected attributes of the one or more attributes, and the output data is indicative of whether an aggregated measure of the plurality of fairness indicators exceeds a predefined fairness threshold. 18 . The method of claim 17 , wherein each of the plurality of fairness indicator values is indicative of a covariance between the gradient of the machine learning model and the gradient of the auxiliary machine learning model. 19 . The method of claim 17 , wherein the aggregated measure is an L-p norm. 20 . A non-transitory computer readable medium storing machine interpretable instructions, which when executed by a processor, cause the processor to perform a method for estimating fairness of a machine learning model, the method comprising: receiving data representative of a value of an input variable of the machine learning model, the machine learning model configured to generate target predictions based on the input variable; generating a first vector indicative of a gradient of the machine learning model evaluated at the value of the input variable; generating a second vector using an auxiliary machine learning model configured to generate predictions indicative of one or more protected attributes based on the input variable, the second vector indicative of a gradient of the auxiliary machine learning model evaluated at the value of the input variable; comparing the first vector to the second vector to generate a fairness indicator value; and generating output data representative of the fairness indicator value, wherein the fairness indicator value is indicative of discrimination risk due to the target predictions generated by the machine learning model.

Assignees

Inventors

Classifications

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Probabilistic or stochastic networks · CPC title

  • Combinations of networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • G06F18/217Primary

    Validation; Performance evaluation; Active pattern learning techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022114399A1 cover?
Systems and methods for diagnosing and testing fairness of machine learning models based on detecting individual violations of group definitions of fairness, via adversarial attacks that aim to perturb model inputs to generate individual violations. The systems and methods employ auxiliary machine learning models using a local surrogate for identifying group membership and assess fairness by me…
Who is the assignee on this patent?
Royal Bank Of Canada
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 14 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).