Model training method and apparatus based on data sharing

US11106802B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11106802-B2
Application numberUS-201816053606-A
CountryUS
Kind codeB2
Filing dateAug 2, 2018
Priority dateAug 2, 2017
Publication dateAug 31, 2021
Grant dateAug 31, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for data sharing between a data miner and a data provider are provided. A set of public parameters is downloaded from the data miner. The public parameters are data miner parameters associated with a feature set of training sample data. A set of private parameters in the data provider can be replaced with the set of public parameters. The private parameters are data provider parameters associated with the feature set of training sample data. The private parameters are updated to provide a set of update results. The private parameters are updated based on a model parameter update algorithm associated with the data provider. The update results is uploaded to the data miner.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for data sharing between a data miner and data providers, each of the data providers comprising one or more processors to execute the computer-implemented method, the computer-implemented method comprising: downloading, by a first data provider, a first set of public parameters from the data miner to generate training sample data, wherein the first set of public parameters comprises parameters maintained by the data miner that are obtained after the data providers jointly participate in an update and the first set of public parameters is associated with a first feature set of training sample data, wherein the training sample data comprises a plurality of features, and the first feature set is a first subset of the plurality of features allocated by the data miner to the first data provider to be downloaded in parallel with a second subset of the plurality of features allocated by the data miner to a second data provider to satisfy a convergence condition, wherein the first subset of the plurality of features is disjoint from the second subset of the plurality of features; replacing, by the first data provider, a set of private parameters of the first data provider with the first set of public parameters to generate a set of replaced values of the set of private parameters, wherein the set of private parameters comprises data provider parameters maintained solely by the first data provider and wherein the set of private parameters is associated with the first feature set of training sample data; updating, by the first data provider, the set of private parameters to provide a set of update results comprising parameter change values to the set of private parameters and excluding all private parameters of the set of private parameters, the set of private parameters being updated based on a model parameter update algorithm associated with the first data provider, wherein the parameter change values comprise differences between the replaced values of the set of private parameters and the set of update results; sorting, by the first data provider, the set of update results to generate a first sorted set of update results comprising a predetermined number of parameters of the set of update results with the parameter change values that are greater than a predetermined value provided by the data miner; generating, by the first data provider, a first truncated set of update results by truncating the first sorted set of update results based on a predetermined value range; and uploading, by the first data provider, the first truncated set of update results to the data miner to be processed with a second truncated set of update results uploaded by the second data provider to verify the convergence condition. 2. The computer-implemented method of claim 1 , wherein the set of update results are provided based on a plurality of parameter changes, each parameter change indicating a change in a private parameter of the set of private parameters before and after replacing a respective private parameter with a public parameter. 3. The computer-implemented method of claim 2 , further comprising adding noise to the set of update results. 4. The computer-implemented method of claim 2 , wherein the set of update results comprise one or more parameter changes with greatest parameter changes associated with the set of private parameters. 5. The computer-implemented method of claim 1 , wherein a number of public parameters in the first set of public parameters is less than number of features in the training sample data. 6. The computer-implemented method of claim 1 , wherein a first number of private parameters in the set of private parameters is less than a second number of features in the training sample data. 7. The computer-implemented method of claim 1 , wherein the first data provider downloads a first set of public parameters using a first data parameter, and the first data provider downloads a second set of public parameters different from the first set of public parameters using a second data parameter. 8. The computer-implemented method of claim 1 , wherein the computer-implemented method is repeated for a plurality of iterations until it is determined, by the one or more processors of the data miner, that a predetermined training condition is satisfied. 9. The computer-implemented method of claim 8 , wherein the first data provider downloads the first set of public parameters in a first iteration of the plurality of iterations that differs from a second set of public parameters in a second iteration of the plurality of iterations. 10. A non-transitory, computer-readable medium storing one or more instructions executable by a computer system comprising one or more processors to perform operations for data sharing between a data miner and data providers, the operations comprising: downloading a first set of public parameters from the data miner to generate training sample data, wherein the first set of public parameters comprises parameters maintained by the data miner that are obtained after the data providers jointly participate in an update and the first set of public parameters is associated with a first feature set of training sample data, wherein the training sample data comprises a plurality of features, and the first feature set is a first subset of the plurality of features allocated by the data miner to a first data provider to be downloaded in parallel with a second subset of the plurality of features allocated by the data miner to a second data provider to satisfy a convergence condition, wherein the first subset of the plurality of features is disjoint from the second subset of the plurality of features; replacing a set of private parameters of the first data provider with the first set of public parameters to generate a set of replaced values of the set of private parameters, wherein the set of private parameters comprises data provider parameters maintained solely by the first data provider and wherein the set of private parameters is associated with the first feature set of training sample data; updating the set of private parameters to provide a set of update results comprising parameter change values to the set of private parameters and excluding all private parameters of the set of private parameters, the set of private parameters being updated based on a model parameter update algorithm associated with the first data provider, wherein the parameter change values comprise differences between the replaced values of the set of private parameters and the set of update results; sorting the set of update results to generate a first sorted set of update results comprising a predetermined number of parameters of the set of update results with the parameter change values that are greater than a predetermined value provided by the data miner; generating a first truncated set of update results by truncating the first sorted set of update results based on a predetermined value range; and uploading the first sorted truncated set of update results to the data miner to be processed with a second wed truncated set of update results uploaded by the second data provider to verify the convergence condition. 11. The non-transitory, computer-readable medium of claim 10 , wherein the set of update results are provided based on a plurality of parameter changes, each parameter change indicating a change in a private parameter of the set of private parameters before and after replacing a respective private parameter with a public parameter. 12. The non-transitory, computer-readable medium of claim 11 , further comprising adding noise to the set of update

Assignees

Inventors

Classifications

  • Risk analysis of enterprise or organisation activities · CPC title

  • using electronic means · CPC title

  • Machine learning · CPC title

  • G06F21/60Primary

    Protecting data · CPC title

  • Office automation; Time management · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11106802B2 cover?
Techniques for data sharing between a data miner and a data provider are provided. A set of public parameters is downloaded from the data miner. The public parameters are data miner parameters associated with a feature set of training sample data. A set of private parameters in the data provider can be replaced with the set of public parameters. The private parameters are data provider paramete…
Who is the assignee on this patent?
Advanced New Technologies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06Q10/0635. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 31 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).