Selectively shared condensed data for efficient federated learning

US2025139495A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025139495-A1
Application numberUS-202318494920-A
CountryUS
Kind codeA1
Filing dateOct 26, 2023
Priority dateOct 26, 2023
Publication dateMay 1, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and related system for training a machine learning model using a federated learning structure by selectively sharing condensed data between devices includes operations to obtain datasets from client devices comprising a first client device and a second client device and updating a datasets subset to comprise a first dataset. The method further includes updating the datasets subset to comprise a second dataset based on a result indicating that a feature space distance between the first dataset and the second dataset satisfies a set of criteria and sending, to a third client device, the datasets subset comprising the first dataset and the second dataset. The method further includes obtaining, from the third client device, a set of model parameters that is derived from training based on the datasets subset and updating a server version of a machine learning model based on the set of model parameters.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system for training a machine learning model using a federated learning structure by selectively sharing condensed data between devices to reduce energy consumption, the system comprising one or more processors and one or more non-transitory, machine-readable media storing program instructions that, when executed by the one or more processors, cause operations comprising: sending a machine learning model to data-sending devices of a federated learning structure; obtaining, from the data-sending devices, a plurality of condensed datasets at a server of the federated learning structure, wherein each respective condensed dataset of the plurality of condensed datasets is an input for a respective local version of the machine learning model; selecting a first condensed dataset for inclusion in a subset of condensed datasets provided by a first client device; determining that a feature space distance between the first condensed dataset and a second condensed dataset provided by a second client device exceeds a difference threshold; selecting the second condensed dataset for inclusion in the subset of condensed datasets in response to a determination that the feature space distance exceeds the difference threshold; sending the subset of condensed datasets comprising the first condensed dataset and the second condensed dataset to a data-receiving device, wherein the data-receiving device generates a set of learning model parameters from training based on the subset of condensed datasets; obtaining, from the data-receiving device, the set of learning model parameters derived from the training based on the subset of condensed datasets; and updating a server-side version of the machine learning model based on the set of learning model parameters. 2 . A method comprising: providing parameters of a machine learning model to a plurality of client devices comprising a first client device, a second client device, and a third client device; obtaining a plurality of condensed datasets comprising a first condensed dataset provided by the first client device and a second condensed dataset provided by the second client device; updating a subset of condensed datasets to comprise the first condensed dataset; determining a result indicating that a feature space distance between the first condensed dataset and the second condensed dataset exceeds a threshold; updating the subset of condensed datasets to comprise the second condensed dataset in response to determining the result indicating that the feature space distance exceeds the threshold; sending, to the third client device, the subset of condensed datasets comprising the first condensed dataset and the second condensed dataset; obtaining, from the third client device, a set of model parameters that is derived from training based on the subset of condensed datasets; and updating a server-side version of the machine learning model based on the set of model parameters. 3 . The method of claim 2 , wherein: the first condensed dataset is provided in association with a model result difference indicating a difference in a learning model result based on the first condensed dataset and a second learning model result based on initial client data stored in the first client device; the method further comprises determining whether the model result difference satisfies a model result threshold; and updating the subset of condensed datasets to comprise the first condensed dataset comprises selecting the first condensed dataset based on a result indicating that the model result difference satisfies the model result threshold. 4 . The method of claim 2 , wherein providing the parameters of the machine learning model causes the first client device to condense local data stored in the first client device to generate the first condensed dataset. 5 . The method of claim 2 , further comprising adding noise to the subset of condensed datasets before sending the subset of condensed datasets to the third client device. 6 . The method of claim 2 , further comprising: generating a centroid of a cluster in a feature space based on the first condensed dataset and the second condensed dataset; and selecting an additional condensed dataset for inclusion in the subset of condensed datasets in response to determining a set of results indicating that the subset of condensed datasets is part of the cluster and is at least a threshold distance away from the centroid in the feature space, wherein sending the subset of condensed datasets to the third client device comprises sending the additional condensed dataset to the third client device. 7 . The method of claim 2 , further comprising sending, to the first client device, first data indicating a first data condensation algorithm, wherein receiving the first data indicating the first data condensation algorithm causes the first client device to generate the first condensed dataset using the first data condensation algorithm. 8 . The method of claim 2 , wherein: the first client device generates the first condensed dataset based on a first local dataset and a second local dataset; the first client device obtains the first local dataset during a first time interval; and the first client device obtains the second local dataset during a second time interval that is different from the first time interval. 9 . The method of claim 2 , wherein the first condensed dataset is selected based on a randomly generated value. 10 . The method of claim 2 , further comprising sending the subset of condensed datasets to at least one device of the first client device or the second client device. 11 . One or more non-transitory, machine-readable media storing program instructions that, when executed by one or more processors, perform operations comprising: obtaining a plurality of condensed datasets from a plurality of client devices comprising a first client device and a second client device, wherein the plurality of condensed datasets comprises a first condensed dataset provided by the first client device and a second condensed dataset provided by the second client device; updating a subset of condensed datasets to comprise the first condensed dataset; updating the subset of condensed datasets to comprise the second condensed dataset based on a result indicating that a feature space distance between the first condensed dataset and the second condensed dataset satisfies a set of criteria; sending, to a third client device, the subset of condensed datasets comprising the first condensed dataset and the second condensed dataset; obtaining, from the third client device, a set of model parameters that is derived from training based on the subset of condensed datasets; and updating a server version of a machine learning model based on the set of model parameters. 12 . The one or more non-transitory, machine-readable media of claim 11 , wherein the subset of condensed datasets comprises a third condensed dataset, the operations further comprising selecting the third condensed dataset based on a randomly generated value. 13 . The one or more non-transitory, machine-readable media of claim 11 , wherein updating the subset of condensed datasets to comprise the second condensed dataset comprises: performing a clustering operation based on the plurality of condensed datasets to generate a first cluster; determining that the second condensed dataset and the second condensed dataset are in the first cluster; and selecting the second condensed dataset comprises selecting the second condensed dataset in response to a result indicating that the second condense

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025139495A1 cover?
A method and related system for training a machine learning model using a federated learning structure by selectively sharing condensed data between devices includes operations to obtain datasets from client devices comprising a first client device and a second client device and updating a datasets subset to comprise a first dataset. The method further includes updating the datasets subset to c…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu May 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).