Test suite for different kinds of biases in data

US11610079B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11610079-B2
Application numberUS-202016777912-A
CountryUS
Kind codeB2
Filing dateJan 31, 2020
Priority dateJan 31, 2020
Publication dateMar 21, 2023
Grant dateMar 21, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided computer implemented method for detecting and reducing or removing bias for generating a machine learning model, comprising: prior to generating the machine learning model: receiving a training dataset, comprising target inputs, each comprising parameters and labelled with a corresponding target output, wherein at least one of the parameters of at least of the target inputs comprises a sensitive parameter indicative of the corresponding target input assigned to a sensitive group that is potentially biased against other target inputs that are excluded from the sensitive group, analyzing the training dataset to identify target inputs affected by label bias when a statistically significant difference is detected between target inputs assigned to the sensitive group and target inputs excluded from the sensitive group, correcting labels of the target inputs affected by label bias, and generating the machine learning model using the corrected labels.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for detecting and reducing or removing bias for generating a machine learning model, comprising: (i) prior to generating the machine learning model: receiving a training dataset, comprising a plurality of target inputs, each comprising a plurality of parameters and labelled with a corresponding target output, wherein at least one of the plurality of parameters of at least one of the plurality of target inputs comprises a corresponding sensitive parameter indicative of a corresponding target input assigned to a sensitive group that is potentially biased against other target inputs that are excluded from the sensitive group; analyzing the training dataset to identify target inputs affected by label bias when a statistically significant difference is detected between target inputs assigned to the sensitive group and the other target inputs excluded from the sensitive group; correcting labels of the target inputs affected by label bias; and (ii) generating the machine learning model using the corrected labels. 2. The method of claim 1 , further comprising: computing by a score computing machine learning model, for each respective target input, a probability of the respective target input being assigned to the sensitive group according to a respective value of the corresponding sensitive parameter. 3. The method of claim 2 , further comprising: clustering the plurality of target inputs into a plurality of clusters according to corresponding computed probabilities, wherein for each respective cluster; wherein the target inputs are assigned to the respective cluster and associated with probabilities within a certain probability value range, wherein each cluster includes the target inputs assigned to the sensitive group and the other target inputs excluded from the sensitive group, for each respective cluster: determining whether the statistically significant difference exists between the target inputs assigned to the sensitive group and the other target inputs excluded from the sensitive group; and identifying label bias for the respective cluster when the statistically significant difference is detected, wherein the target inputs affected by label bias comprise the target inputs of the respective cluster, including the target inputs assigned to the sensitive group and the other target inputs excluded from the sensitive group. 4. The method of claim 3 , wherein correcting labels of the target inputs affected by label bias comprises correcting labels for the target inputs assigned to the sensitive groups and labels of the other target inputs excluded from the sensitive group. 5. The method of claim 4 , wherein the correcting labels comprises assigning the same label to all of the target inputs assigned to the sensitive groups and to all of the other target inputs excluded from the sensitive group. 6. The method of claim 2 , wherein the probability of the respective target input being assigned to the sensitive group is computed by the score computing machine learning model performing a causal inference process, wherein a treatment of the causal inference process is the sensitive parameter. 7. The method of claim 6 , wherein the causal inference process comprises a propensity score matching (PSM) process, and the probability denotes the propensity score. 8. The method of claim 2 , further comprising: computing accuracy of the score computing machine learning model for computing the probability of the respective target input being assigned to the sensitive group; and identifying sampling bias between target inputs of the sensitive group and the other target inputs excluded from the sensitive group when the accuracy of the score computing machine learning model is above a threshold. 9. The method of claim 8 , wherein (ii) generating further comprises generating one respective machine learning model for the target inputs of the sensitive group and generating another respective machine learning model for the other target inputs excluded from the sensitive group. 10. The method of claim 1 , wherein each of the plurality of target inputs comprises the sensitive parameter, wherein the target inputs assigned to the sensitive group include a value of the sensitive parameter meeting a requirement, and the other target inputs excluded from the sensitive group include another value of the sensitive parameter that does not meet the requirement. 11. The method of claim 1 , wherein the sensitive parameter is selected from a group consisting of: gender, race, and age. 12. A computer implemented method for detecting and reducing or removing bias for generating a machine learning model, comprising: (i) prior to generating the machine learning model: receiving a training dataset, comprising a plurality of target inputs, each comprising a plurality of parameters and labelled with a corresponding target output, wherein at least one of the plurality of parameters of at least one of the plurality of target inputs comprises a sensitive parameter indicative of a corresponding target input assigned to a sensitive group that is potentially biased against other target inputs that are excluded from the sensitive group; analyzing the training dataset to detect sampling bias between target inputs of the sensitive group and the other target inputs excluded from the sensitive group; and (ii) generating one respective machine learning model for the target inputs of the sensitive group and generating another respective machine learning model for the other target inputs excluded from the sensitive group. 13. The method of claim 12 , further comprising computing accuracy of a score computing machine learning model that computes a probability of a certain target input being assigned to the sensitive group; and detecting the sampling bias when the accuracy of the score computing machine learning model is above a threshold. 14. A system for detecting and reducing or removing bias for generating a machine learning model, comprising: at least one hardware processor executing a code for: (i) prior to generating the machine learning model: receiving a training dataset, comprising a plurality of target inputs, each comprising a plurality of parameters and labelled with a corresponding target output, wherein at least one of the plurality of parameters of at least one of the plurality of target inputs comprises a sensitive parameter indicative of a corresponding target input assigned to a sensitive group that is potentially biased against target inputs that are excluded from the sensitive group; analyzing the training dataset to identify target inputs affected by label bias when a statistically significant difference is detected between target inputs assigned to the sensitive group and the target inputs excluded from the sensitive group; correcting labels of the target inputs affected by label bias; and (ii) generating the machine learning model using the corrected labels. 15. The system of claim 14 , further comprising, code for: computing accuracy of a score computing machine learning model for computing a probability of a respective target input being assigned to the sensitive group; and identifying sampling bias between target inputs of the sensitive group and target inputs of the sensitive group when the accuracy of the score computing machine learning model is above a threshold. 16. The system of claim 15 , wherein generating the machine learning model further comprises generating one respective machine learning model for the target inputs of the sensiti

Assignees

Inventors

Classifications

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Clustering techniques · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11610079B2 cover?
There is provided computer implemented method for detecting and reducing or removing bias for generating a machine learning model, comprising: prior to generating the machine learning model: receiving a training dataset, comprising target inputs, each comprising parameters and labelled with a corresponding target output, wherein at least one of the parameters of at least of the target inputs co…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).