Customizable federated learning
US-2023409983-A1 · Dec 21, 2023 · US
US12561609B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12561609-B2 |
| Application number | US-202217860128-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 8, 2022 |
| Priority date | Apr 29, 2021 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed is a federated learning method for a k-means clustering algorithm. Horizontal federated learning includes the following steps: 1) initializing K clusters, and distributing, a local sample to a cluster closest to the sample; 2) calculating a new cluster center of the cluster; and 3) if the cluster center changes, then returning to step 1). Vertical federated learning includes the following steps: 1) running, the k-means clustering algorithm locally to obtain T local clusters and intersecting to obtain T L new clusters, or running an AP clustering algorithm to obtain T i clusters and intersecting to obtain Π i=1 L T i new clusters; 2) taking T L (Π i=1 L T i ) new cluster centers as input samples, and initializing the K clusters; 3) distributing each sample to the cluster closest to the sample; 4) calculating a new cluster center of the cluster; and 5) if the cluster center changes, then returning to step 3).
Opening claim text (preview).
What is claimed is: 1 . A federated learning method for a k-means clustering algorithm, applied to distributed clustering scenarios of user data among banks, wherein the method mainly comprises two parts, vertical federated learning and horizontal federated learning; participants are banks holding user data, samples are bank users, and features of the samples include users' income and age; the horizontal federated learning comprises the following steps: (i) initializing K cluster centers, wherein the cluster centers are two-dimensional arrays containing income and age dimensions, and sending the K cluster centers to all banks holding user data, wherein each participant is a database with same income and age and different bank users, and all the banks holding user data together constitute a total database; each sample refers to a piece of data in the database; (ii) calculating a square of a Euclidean distance between each sample of all the banks holding user data and the cluster centers, respectively, finding the cluster center with the smallest square of the Euclidean distance for each bank user, and distributing the bank users into a cluster corresponding to the cluster center; and (iii) counting a quantity of the bank users and a sum of incomes, a sum of ages, and a sum of bank users of each cluster in the banks holding user data locally, and then calculating the quantity of samples, a total sum of incomes, a total sum of ages of each cluster in the total database by using a secure aggregation method, taking the average value obtained by calculation as a new cluster center of each cluster; an income dimension of the new cluster center is the total sum of incomes/quantity of samples), and an age dimension is the total sum of ages/total quantity of samples); if the new cluster centers are different from original cluster centers and a count of iterations is less than a set count, then returning to step (ii), and increasing the count of iterations by one; wherein the sum of the samples refers to corresponding summation of several pieces of data corresponding to the samples according to the characteristics, without changing a dimension of the samples; and the vertical federated learning comprises the following steps: (iv) each bank holding user data being the database with the same bank users and different characteristics, all the banks holding user data together constituting the total database, wherein each sample in each bank holding user data refers to a piece of data in the database; running, by L banks holding user data respectively, the k-means clustering algorithm locally to obtain T local clusters and corresponding centers of the banks holding user data and sending, by each bank holding user data, labels of bank users in the T clusters and corresponding cluster labels to the last bank holding user data, or running, by L banks holding user data, respectively, an AP clustering algorithm locally to obtain some local clusters and corresponding centers of the banks holding user data, determining the quantity of the clusters by the algorithm and denoting as T i ; and then sending, by each bank holding user data, labels of the bank users in the clusters and corresponding center labels to the last bank holding user data; (v) in the last bank holding user data, intersecting the clusters obtained by each bank holding user data to obtain T L or Π i=1 L T i new clusters, sending T L or Π i=1 L T i new cluster results, namely a new cluster label to which each bank user belongs, to all the banks holding user data, calculating the quantity of the bank users, a sum of the features and an average value of each cluster on each bank holding user data, under the features held locally by each bank holding user data, and taking the average value obtained by calculation as a cluster center of each cluster on the characteristics held by the current bank holding user data, so as to obtain cluster centers of the T L or Π i=1 L T i clusters, wherein at the moment, the characteristics of the cluster centers being all stored on different banks holding user data, and wherein the sum of the sample refers to corresponding summation of several pieces of data corresponding to the samples according to the characteristics, without changing the dimension of the samples; (vi) taking the T L or Π i=1 L T i new cluster centers as a new database, the bank users being all bank users in the new database, and at the same time, taking the quantity of the bank users in the T L or Π i=1 L T i clusters as a weight, and initializing the K clusters and K cluster centers thereof; (vii) calculating a square of a Euclidean distance of each sample to a corresponding characteristic of each cluster center stored in the current bank holding user data in each bank holding user data, and then calculating the square of the Euclidean distance between each sample and the cluster center by using secure aggregation, and taking a cluster corresponding to the cluster center with the smallest square of the Euclidean distance from an input sample as a cluster to which the sample belongs; and calculating a weighted average value = ∑ x is samples in cluster weight of x · x Quantity of samples in cluster ( viii ) of the corresponding characteristic of each cluster on different banks holding user data banks holding user data, taking it as the corresponding characteristic of each new cluster center, and if the new cluster centers are different from the original cluster centers and a count of iterations is less than a set count, then returning to step (vii)
with fixed number of clusters, e.g. K-means clustering · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.