System and methods for generation of synthetic data cluster vectors and refinement of machine learning models
US-2021049456-A1 · Feb 18, 2021 · US
US12287848B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12287848-B2 |
| Application number | US-202117345730-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 11, 2021 |
| Priority date | Jun 11, 2021 |
| Publication date | Apr 29, 2025 |
| Grant date | Apr 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention provides techniques for learning Mahalanobis distance similarity metrics from data for individually fair machine learning models. In one aspect, a method for learning a fair Mahalanobis distance similarity metric includes: obtaining data with similarity annotations; selecting, based on the data obtained, a model for learning a Mahalanobis covariance matrix Σ; and learning the Mahalanobis covariance matrix Σ from the data using the model selected, wherein the Mahalanobis covariance matrix Σ fully defines the fair Mahalanobis distance similarity metric.
Opening claim text (preview).
What is claimed is: 1. A method for improving algorithmic fairness of machine learning models using learned fair Mahalanobis distance similarity metrics, the method comprising: obtaining training data comprising similarity annotations; determining one model out of a plurality of models to use in learning a Mahalanobis covariance matrix Σ based on the obtained training data; learning the Mahalanobis covariance matrix Σ from the obtained training data using the determined one model, wherein the Mahalanobis covariance matrix Σ represents a fair Mahalanobis distance similarity metric; and training one or more machine learning models using, at least in part, the fair Mahalanobis distance similarity metric, for one or more machine learning model tasks. 2. The method of claim 1 , wherein the fair Mahalanobis distance similarity metric is of a form: d x ( x 1 ,x 2 )@ φ( x 1 )−φ( x 2 )Σφ( x 1 )−φ( x 2 )) , wherein φ(x):X→R d is an embedding map and Σ∈S + d . 3. The method of claim 1 , wherein the obtained training data comprises groups of comparable samples. 4. The method of claim 3 , wherein the determined one model comprises a factor model. 5. The method of claim 4 , wherein the factor model comprises: φ i =A * u i +B * υ i +ò i , wherein φ i ∈R d is a learned representation of x i , u i ∈R K is a sensitive attribute of x i for a task at hand, υ i ∈R L is a relevant attribute of x i for the task at hand, and ò i is an error term. 6. The method of claim 5 , further comprising: choosing an orthogonal complement of ran(A * ) for the Mahalanobis covariance matrix Σ, wherein ran (A * ) is a column space of A * ; and solving for ran (A * ). 7. The method of claim 1 , wherein the obtained training data comprises pairs of samples that are comparable, incomparable, or combinations thereof. 8. The method of claim 7 , wherein the determined one model comprises a binary response model. 9. The method of claim 8 , wherein the data comprises human user feedback in a form of triplets {(x i1 , x i2 , y i )} i=1 n , where y i ∈{0,1} indicates whether a human user considers x i1 and x i2 comparable, wherein (x i1 , x i2 , y i ) satisfies the binary response model: y i ❘ "\[RightBracketingBar]" x i 1 , x i 2 : Ber ( 2 σ ( - d i ) ) , ( d i @ φ i 1 - φ i 2 ) ∑ 0 2 = ( φ i 1 - φ i 2 ) T ∑ 0 ( φ i 1 - φ i 2 ) = 〈 ( φ i 1 - φ i 2 ) ( φ i
Supervised learning · CPC title
Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.