Computing system and method for creating a data science model having reduced bias

US2022414766A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022414766-A1
Application numberUS-202217900753-A
CountryUS
Kind codeA1
Filing dateAug 31, 2022
Priority dateJun 3, 2020
Publication dateDec 29, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing platform may be configured to (i) train an initial model object for a data science model using a machine learning process, (ii) determine that the initial model object exhibits a threshold level of bias, and (iii) thereafter produce an updated version of the initial model object having mitigated bias by (a) identifying a subset of the initial model object's set of input variables that are to be replaced by transformations, (b) producing a post-processed model object by replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that has one or more unknown parameters, (c) producing a parameterized family of the post-processed model object, and (d) selecting, from the parameterized family of the post-processed model object, one given version of the post-processed model object to use as the updated version of the initial model object for the data science model.

First claim

Opening claim text (preview).

We claim: 1 . A computing platform comprising: at least one network interface for communicating over at least one data network; at least one processor; at least one non-transitory computer-readable medium; and program instructions stored on the at least one non-transitory computer-readable medium that are executable by the at least one processor such that the computing platform is configured to: train an initial model object for a data science model using a machine learning process, wherein the initial model object is configured to receive values for a set of input variables and generate an output value; based on an evaluation of the initial model object's bias, determine that the initial model object exhibits a threshold level of bias with respect to at least one given attribute; and after determining that the initial model object exhibits the threshold level of bias, produce an updated version of the initial model object having mitigated bias by: based on an evaluation of the initial model object's set of input variables, identifying a subset of the initial model object's set of input variables that are to be replaced by transformations; producing a post-processed model object by replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that has one or more unknown parameters; producing a parameterized family of the post-processed model object; and selecting, from the parameterized family of the post-processed model object, one given version of the post-processed model object to use as the updated version of the initial model object for the data science model. 2 . The computing platform of claim 1 , wherein the threshold level of bias with respect to the at least one given attribute comprises a threshold level of bias with respect to a pair of subpopulations defined based on the given attribute that comprises a protected subpopulation and a non-protected subpopulation. 3 . The computing platform of claim 2 , wherein the evaluation of the initial model object's bias involves: accessing a historical dataset comprising a first set of historical data records for individuals belonging the protected subpopulation and a second set of historical data records for individuals belonging the non-protected subpopulation; inputting the first set of historical data records into the initial model object and thereby generating a first set of model scores for the protected subpopulation; inputting the second set of historical data records into the initial model object and thereby generating a second set of model scores for the non-protected subpopulation; and based on the first and second sets of model scores, quantifying the bias exhibited by the initial model object for the protected and non-protected subpopulations. 4 . The computing platform of claim 3 , wherein quantifying the bias exhibited by the initial model object for the protected and non-protected subpopulations comprises: determining at least one of (i) a positive bias metric that quantifies a portion of the initial model object's bias that favors the non-protected subpopulation or (ii) a negative bias metric that quantifies a portion of the initial model object's bias that favors the protected subpopulation. 5 . The computing platform of claim 2 , wherein the evaluation of the initial model object's set of input variables involves: based on an evaluation of dependencies between the initial model object's set of input variables, dividing the initial model object's set of input variables into a set of variable groups that each comprises one or more input variables; and quantifying a respective bias contribution of each respective variable group in defined set of variable groups using an explanability technique and a historical dataset comprising a first set of historical data records for individuals belonging the protected subpopulation and a second set of historical data records for individuals belonging the non-protected subpopulation. 6 . The computing platform of claim 5 , wherein quantifying the respective bias contribution of each respective variable group comprises: for each respective variable group, determining at least one of (i) a respective positive bias contribution metric that quantifies the respective variable group's contribution to either increasing a bias favoring the non-protected subpopulation or decreasing a bias favoring the protected subpopulation or (ii) a respective negative bias contribution metric that quantifies the respective variable group's contribution to either increasing a bias favoring the protected subpopulation or decreasing a bias favoring the non-protected subpopulation. 7 . The computing platform of claim 1 , wherein the respective transformation of each respective input variable in the identified subset comprises one of (i) a first type of transformation that compresses or expands the respective input variable in a linear and symmetric manner, (ii) a second type of transformation that compresses or expands the respective input variable in a linear and asymmetric manner, (iii) a third type of transformation that compresses or expands the respective input variable in a non-linear and symmetric manner, or (iv) a fourth type of transformation that compresses or expands the respective input variable in a non-linear and asymmetric manner. 8 . The computing platform of claim 1 , wherein producing the post-processed model object by replacing each respective input variable in the identified subset with the respective transformation of the respective input variable comprises: replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that is selected based on a determination of the respective input variable's contribution to the initial model object's bias. 9 . The computing platform of claim 1 , wherein producing the post-processed model object further comprises calibrating the post-processed model object to align a scale of post-processed model object's output with a scale of the initial model object's output. 10 . The computing platform of claim 1 , wherein producing the parameterized family of the post-processed model object comprises: using a Bayesian optimization technique that functions to evaluate a bias and a performance of different versions of the post-processed model object that are produced by using different combinations of values for the unknown parameters included within the post-processed model object and thereby producing a parameterized family of the post-processed model object based on versions of the post-processed model object that form an efficient frontier for a tradeoff between the post-processed model object's bias and the post-processed model object's performance. 11 . The computing platform of claim 10 , wherein producing the parameterized family of the post-processed model object further comprises: after producing the parameterized family of the post-processed model object using the Bayesian optimization technique, expanding the parameterized family of the post-processed model object to include additional versions of the post-processed model object. 12 . The computing platform of claim 11 , wherein expanding the parameterized family of the post-processed model object to include additional versions of the post-processed model object comprises: constructing combined versions of the post-processed model object from respective pairs of versions of the post-processed model object that are in the parameterized family of the post-processed model object produced using the Bayesian optimiza

Assignees

Inventors

Classifications

  • G06Q40/03Primary

    Credit; Loans; Processing thereof · CPC title

  • G06Q40/025Primary

    Physics · mapped topic

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022414766A1 cover?
A computing platform may be configured to (i) train an initial model object for a data science model using a machine learning process, (ii) determine that the initial model object exhibits a threshold level of bias, and (iii) thereafter produce an updated version of the initial model object having mitigated bias by (a) identifying a subset of the initial model object's set of input variables th…
Who is the assignee on this patent?
Discover Financial Services
What technology area does this patent fall under?
Primary CPC classification G06Q40/03. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).