What technology area does this patent fall under?

Primary CPC classification G06Q40/03. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Dec 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Computing system and method for creating a data science model having reduced bias

US2022414766A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022414766-A1
Application number	US-202217900753-A
Country	US
Kind code	A1
Filing date	Aug 31, 2022
Priority date	Jun 3, 2020
Publication date	Dec 29, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing platform may be configured to (i) train an initial model object for a data science model using a machine learning process, (ii) determine that the initial model object exhibits a threshold level of bias, and (iii) thereafter produce an updated version of the initial model object having mitigated bias by (a) identifying a subset of the initial model object's set of input variables that are to be replaced by transformations, (b) producing a post-processed model object by replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that has one or more unknown parameters, (c) producing a parameterized family of the post-processed model object, and (d) selecting, from the parameterized family of the post-processed model object, one given version of the post-processed model object to use as the updated version of the initial model object for the data science model.

First claim

Opening claim text (preview).

We claim: 1 . A computing platform comprising: at least one network interface for communicating over at least one data network; at least one processor; at least one non-transitory computer-readable medium; and program instructions stored on the at least one non-transitory computer-readable medium that are executable by the at least one processor such that the computing platform is configured to: train an initial model object for a data science model using a machine learning process, wherein the initial model object is configured to receive values for a set of input variables and generate an output value; based on an evaluation of the initial model object's bias, determine that the initial model object exhibits a threshold level of bias with respect to at least one given attribute; and after determining that the initial model object exhibits the threshold level of bias, produce an updated version of the initial model object having mitigated bias by: based on an evaluation of the initial model object's set of input variables, identifying a subset of the initial model object's set of input variables that are to be replaced by transformations; producing a post-processed model object by replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that has one or more unknown parameters; producing a parameterized family of the post-processed model object; and selecting, from the parameterized family of the post-processed model object, one given version of the post-processed model object to use as the updated version of the initial model object for the data science model. 2 . The computing platform of claim 1 , wherein the threshold level of bias with respect to the at least one given attribute comprises a threshold level of bias with respect to a pair of subpopulations defined based on the given attribute that comprises a protected subpopulation and a non-protected subpopulation. 3 . The computing platform of claim 2 , wherein the evaluation of the initial model object's bias involves: accessing a historical dataset comprising a first set of historical data records for individuals belonging the protected subpopulation and a second set of historical data records for individuals belonging the non-protected subpopulation; inputting the first set of historical data records into the initial model object and thereby generating a first set of model scores for the protected subpopulation; inputting the second set of historical data records into the initial model object and thereby generating a second set of model scores for the non-protected subpopulation; and based on the first and second sets of model scores, quantifying the bias exhibited by the initial model object for the protected and non-protected subpopulations. 4 . The computing platform of claim 3 , wherein quantifying the bias exhibited by the initial model object for the protected and non-protected subpopulations comprises: determining at least one of (i) a positive bias metric that quantifies a portion of the initial model object's bias that favors the non-protected subpopulation or (ii) a negative bias metric that quantifies a portion of the initial model object's bias that favors the protected subpopulation. 5 . The computing platform of claim 2 , wherein the evaluation of the initial model object's set of input variables involves: based on an evaluation of dependencies between the initial model object's set of input variables, dividing the initial model object's set of input variables into a set of variable groups that each comprises one or more input variables; and quantifying a respective bias contribution of each respective variable group in defined set of variable groups using an explanability technique and a historical dataset comprising a first set of historical data records for individuals belonging the protected subpopulation and a second set of historical data records for individuals belonging the non-protected subpopulation. 6 . The computing platform of claim 5 , wherein quantifying the respective bias contribution of each respective variable group comprises: for each respective variable group, determining at least one of (i) a respective positive bias contribution metric that quantifies the respective variable group's contribution to either increasing a bias favoring the non-protected subpopulation or decreasing a bias favoring the protected subpopulation or (ii) a respective negative bias contribution metric that quantifies the respective variable group's contribution to either increasing a bias favoring the protected subpopulation or decreasing a bias favoring the non-protected subpopulation. 7 . The computing platform of claim 1 , wherein the respective transformation of each respective input variable in the identified subset comprises one of (i) a first type of transformation that compresses or expands the respective input variable in a linear and symmetric manner, (ii) a second type of transformation that compresses or expands the respective input variable in a linear and asymmetric manner, (iii) a third type of transformation that compresses or expands the respective input variable in a non-linear and symmetric manner, or (iv) a fourth type of transformation that compresses or expands the respective input variable in a non-linear and asymmetric manner. 8 . The computing platform of claim 1 , wherein producing the post-processed model object by replacing each respective input variable in the identified subset with the respective transformation of the respective input variable comprises: replacing each respective input variable in the identified subset with a respective transformation of the respective input variable that is selected based on a determination of the respective input variable's contribution to the initial model object's bias. 9 . The computing platform of claim 1 , wherein producing the post-processed model object further comprises calibrating the post-processed model object to align a scale of post-processed model object's output with a scale of the initial model object's output. 10 . The computing platform of claim 1 , wherein producing the parameterized family of the post-processed model object comprises: using a Bayesian optimization technique that functions to evaluate a bias and a performance of different versions of the post-processed model object that are produced by using different combinations of values for the unknown parameters included within the post-processed model object and thereby producing a parameterized family of the post-processed model object based on versions of the post-processed model object that form an efficient frontier for a tradeoff between the post-processed model object's bias and the post-processed model object's performance. 11 . The computing platform of claim 10 , wherein producing the parameterized family of the post-processed model object further comprises: after producing the parameterized family of the post-processed model object using the Bayesian optimization technique, expanding the parameterized family of the post-processed model object to include additional versions of the post-processed model object. 12 . The computing platform of claim 11 , wherein expanding the parameterized family of the post-processed model object to include additional versions of the post-processed model object comprises: constructing combined versions of the post-processed model object from respective pairs of versions of the post-processed model object that are in the parameterized family of the post-processed model object produced using the Bayesian optimiza

Assignees

Discover Financial Services

Inventors

Classifications

G06Q40/03Primary
Credit; Loans; Processing thereof · CPC title
G06Q40/025Primary
Physics · mapped topic
G06N7/01
Probabilistic graphical models, e.g. probabilistic networks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06F30/27
using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title

Patent family

Related publications grouped by family.

View patent family 84541139

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022414766A1 cover?: A computing platform may be configured to (i) train an initial model object for a data science model using a machine learning process, (ii) determine that the initial model object exhibits a threshold level of bias, and (iii) thereafter produce an updated version of the initial model object having mitigated bias by (a) identifying a subset of the initial model object's set of input variables th…
Who is the assignee on this patent?: Discover Financial Services
What technology area does this patent fall under?: Primary CPC classification G06Q40/03. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Dec 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods to identify neural network brittleness based on sample data and seed generation

Automated input-data monitoring to dynamically adapt machine-learning techniques

Determining data representative of bias within a model

Systems and methods for determining vehicle trajectories directly from data indicative of human-driving behavior

Determining Data Representative of Bias Within a Model

Frequently asked questions