Bayesian nonparametric learning of neural networks

US2021089878A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021089878-A1
Application numberUS-201916576927-A
CountryUS
Kind codeA1
Filing dateSep 20, 2019
Priority dateSep 20, 2019
Publication dateMar 25, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. A Bayesian nonparametric framework is presented for federated learning with neural networks. Each data server is assumed to provide local neural network weights, which are modeled through our framework. An inference approach is presented that allows us to synthesize a more expressive global network without additional supervision, data pooling and with as few as a single communication round. The efficacy of the present invention on federated learning problems simulated from two popular image classification datasets is shown.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for managing efficient machine learning, the method comprising: operating a network in which a plurality of client computing devices are communicatively coupled with a centralized computing device, wherein each of the plurality of client computing devices includes a local machine learning model that is pre-trained on locally accessible data, and wherein the locally accessible data has a common structure across all the plurality of client computing devices; accessing, by the centralized computing device, a plurality of artificial local neurons from each of the local machine learning models; clustering each of the plurality of artificial local neurons into a plurality of specific groups as part of a set of global neurons; and forming a global machine learning model layer by averaging the plurality of artificial local neurons previously clustered into one of a plurality of specific groups as part of a set of global neurons. 2 . The computer-implemented method of claim 1 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is performed with permutation-invariant probabilistic matching each of the plurality of artificial local neurons using Bayesian nonparametrics. 3 . The computer-implemented method of claim 1 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is performed with groups of weight vectors, bias vectors, or a combination of weight vectors and bias vectors associated with each of the plurality of artificial local neurons. 4 . The computer-implemented method of claim 1 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is controlled by hyperparameters. 5 . The computer-implemented method of claim 1 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons results in one or more of the plurality of artificial local neurons being left unmatched. 6 . The computer-implemented method of claim 1 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons results a number of neurons in the set of global neurons being smaller than a numeric sum of all of the plurality of artificial local neurons. 7 . The computer-implemented method of claim 1 , wherein the accessing, by the centralized computing device, the plurality of artificial local neurons from each of the plurality of client computing devices requires only a single read communication between the centralized computing device and each of the plurality of client computing devices. 8 . The computer-implemented method of claim 1 , wherein each of the plurality of client computing devices includes a local machine learning model that is a multilayer artificial neural network. 9 . The computer-implemented method of claim 1 , wherein each of the plurality of client computing devices includes the local machine learning model that is pre-trained on locally accessible data in which the data changes overtime. 10 . The computer-implemented method of claim 1 , wherein the locally accessible data has a common structure that is both heterogeneous and overlapping across all the plurality of client computing devices. 11 . A computer system for managing efficient machine learning, the computer system comprising: a processor device; and a memory operably coupled to the processor device and storing computer-executable instructions causing: operating a network in which a plurality of client computing devices are communicatively coupled with a centralized computing device, wherein each of the plurality of client computing devices includes a local machine learning model that is pre-trained on locally accessible data, and wherein the locally accessible data has a common structure across all the plurality of client computing devices; accessing, by the centralized computing device, a plurality of artificial local neurons from each of the local machine learning models; clustering each of the plurality of artificial local neurons into a plurality of specific groups as part of a set of global neurons; and forming a global machine learning model layer by averaging the plurality of artificial local neurons previously clustered into one of a plurality of specific groups as part of a set of global neurons. 12 . The computer system of claim 11 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is performed with permutation-invariant probabilistic matching each of the plurality of artificial local neurons using Bayesian nonparametrics. 13 . The computer system of claim 11 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is performed with groups of weight vectors, bias vectors, or a combination of weight vectors and bias vectors associated with each of the plurality of artificial local neurons. 14 . The computer system of claim 11 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons is controlled by hyperparameters. 15 . The computer system of claim 11 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons results in one or more of the plurality of artificial local neurons being left unmatched. 16 . The computer system of claim 11 , wherein the clustering each of the plurality of artificial local neurons into the plurality of specific groups as part of the set of global neurons results a number of neurons in the set of global neurons being smaller than a numeric sum of all of the plurality of artificial local neurons. 17 . The computer system of claim 11 , wherein the accessing, by the centralized computing device, the plurality of artificial local neurons from each of the plurality of client computing devices requires only a single read communication between the centralized computing device and each of the plurality of client computing devices. 18 . The computer system of claim 11 , wherein each of the plurality of client computing devices includes a local machine learning model that is a multilayer artificial neural network. 19 . The computer system of claim 11 , wherein each of the plurality of client computing devices includes the local machine learning model that is pre-trained on locally accessible data in which the data changes overtime. 20 . A computer program product for managing efficient machine learning, the computer program product comprising: a non-transitory computer readable storage medium readable by a processing device and storing program instructions for execution by the processing device, said program instructions comprising: operating a network in which a plurality of client computing devices are communicatively coupled with a centralized computing device, wherein each of the plurality of client computing devices includes a local machine learning model that is pre-trained on locally accessible data, and wherein the locally accessible data

Assignees

Inventors

Classifications

  • G06N3/047Primary

    Probabilistic or stochastic networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021089878A1 cover?
In federated learning problems, data is scattered across different servers and exchanging or pooling it is often impractical or prohibited. A Bayesian nonparametric framework is presented for federated learning with neural networks. Each data server is assumed to provide local neural network weights, which are modeled through our framework. An inference approach is presented that allows us to s…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/047. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).