Model training method and apparatus based on data sharing
US-2020125737-A1 · Apr 23, 2020 · US
US11106802B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11106802-B2 |
| Application number | US-201816053606-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 2, 2018 |
| Priority date | Aug 2, 2017 |
| Publication date | Aug 31, 2021 |
| Grant date | Aug 31, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for data sharing between a data miner and a data provider are provided. A set of public parameters is downloaded from the data miner. The public parameters are data miner parameters associated with a feature set of training sample data. A set of private parameters in the data provider can be replaced with the set of public parameters. The private parameters are data provider parameters associated with the feature set of training sample data. The private parameters are updated to provide a set of update results. The private parameters are updated based on a model parameter update algorithm associated with the data provider. The update results is uploaded to the data miner.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for data sharing between a data miner and data providers, each of the data providers comprising one or more processors to execute the computer-implemented method, the computer-implemented method comprising: downloading, by a first data provider, a first set of public parameters from the data miner to generate training sample data, wherein the first set of public parameters comprises parameters maintained by the data miner that are obtained after the data providers jointly participate in an update and the first set of public parameters is associated with a first feature set of training sample data, wherein the training sample data comprises a plurality of features, and the first feature set is a first subset of the plurality of features allocated by the data miner to the first data provider to be downloaded in parallel with a second subset of the plurality of features allocated by the data miner to a second data provider to satisfy a convergence condition, wherein the first subset of the plurality of features is disjoint from the second subset of the plurality of features; replacing, by the first data provider, a set of private parameters of the first data provider with the first set of public parameters to generate a set of replaced values of the set of private parameters, wherein the set of private parameters comprises data provider parameters maintained solely by the first data provider and wherein the set of private parameters is associated with the first feature set of training sample data; updating, by the first data provider, the set of private parameters to provide a set of update results comprising parameter change values to the set of private parameters and excluding all private parameters of the set of private parameters, the set of private parameters being updated based on a model parameter update algorithm associated with the first data provider, wherein the parameter change values comprise differences between the replaced values of the set of private parameters and the set of update results; sorting, by the first data provider, the set of update results to generate a first sorted set of update results comprising a predetermined number of parameters of the set of update results with the parameter change values that are greater than a predetermined value provided by the data miner; generating, by the first data provider, a first truncated set of update results by truncating the first sorted set of update results based on a predetermined value range; and uploading, by the first data provider, the first truncated set of update results to the data miner to be processed with a second truncated set of update results uploaded by the second data provider to verify the convergence condition. 2. The computer-implemented method of claim 1 , wherein the set of update results are provided based on a plurality of parameter changes, each parameter change indicating a change in a private parameter of the set of private parameters before and after replacing a respective private parameter with a public parameter. 3. The computer-implemented method of claim 2 , further comprising adding noise to the set of update results. 4. The computer-implemented method of claim 2 , wherein the set of update results comprise one or more parameter changes with greatest parameter changes associated with the set of private parameters. 5. The computer-implemented method of claim 1 , wherein a number of public parameters in the first set of public parameters is less than number of features in the training sample data. 6. The computer-implemented method of claim 1 , wherein a first number of private parameters in the set of private parameters is less than a second number of features in the training sample data. 7. The computer-implemented method of claim 1 , wherein the first data provider downloads a first set of public parameters using a first data parameter, and the first data provider downloads a second set of public parameters different from the first set of public parameters using a second data parameter. 8. The computer-implemented method of claim 1 , wherein the computer-implemented method is repeated for a plurality of iterations until it is determined, by the one or more processors of the data miner, that a predetermined training condition is satisfied. 9. The computer-implemented method of claim 8 , wherein the first data provider downloads the first set of public parameters in a first iteration of the plurality of iterations that differs from a second set of public parameters in a second iteration of the plurality of iterations. 10. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system comprising one or more processors to perform operations for data sharing between a data miner and data providers, the operations comprising: downloading a first set of public parameters from the data miner to generate training sample data, wherein the first set of public parameters comprises parameters maintained by the data miner that are obtained after the data providers jointly participate in an update and the first set of public parameters is associated with a first feature set of training sample data, wherein the training sample data comprises a plurality of features, and the first feature set is a first subset of the plurality of features allocated by the data miner to a first data provider to be downloaded in parallel with a second subset of the plurality of features allocated by the data miner to a second data provider to satisfy a convergence condition, wherein the first subset of the plurality of features is disjoint from the second subset of the plurality of features; replacing a set of private parameters of the first data provider with the first set of public parameters to generate a set of replaced values of the set of private parameters, wherein the set of private parameters comprises data provider parameters maintained solely by the first data provider and wherein the set of private parameters is associated with the first feature set of training sample data; updating the set of private parameters to provide a set of update results comprising parameter change values to the set of private parameters and excluding all private parameters of the set of private parameters, the set of private parameters being updated based on a model parameter update algorithm associated with the first data provider, wherein the parameter change values comprise differences between the replaced values of the set of private parameters and the set of update results; sorting the set of update results to generate a first sorted set of update results comprising a predetermined number of parameters of the set of update results with the parameter change values that are greater than a predetermined value provided by the data miner; generating a first truncated set of update results by truncating the first sorted set of update results based on a predetermined value range; and uploading the first sorted truncated set of update results to the data miner to be processed with a second wed truncated set of update results uploaded by the second data provider to verify the convergence condition. 11. The non-transitory, computer-readable medium of claim 10 , wherein the set of update results are provided based on a plurality of parameter changes, each parameter change indicating a change in a private parameter of the set of private parameters before and after replacing a respective private parameter with a public parameter. 12. The non-transitory, computer-readable medium of claim 11 , further comprising adding noise to the set of update
Risk analysis of enterprise or organisation activities · CPC title
using electronic means · CPC title
Machine learning · CPC title
Protecting data · CPC title
Office automation; Time management · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.