Distributed privacy-preserving computing on protected data
US-11531904-B2 · Dec 20, 2022 · US
US11748633B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11748633-B2 |
| Application number | US-202217988664-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 16, 2022 |
| Priority date | Mar 26, 2019 |
| Publication date | Sep 5, 2023 |
| Grant date | Sep 5, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a data host based on the input data requirements, curating the data assets within a data storage structure that is within infrastructure of the data host, and integrating the algorithm into a secure capsule computing framework. The secure capsule computing framework serves the algorithm to the data assets within the data storage structure in a secure manner that preserves privacy of the data assets and the algorithm. The computer implemented method further includes running the data assets through the algorithm to obtain an inference.
Opening claim text (preview).
What is claimed is: 1. A method comprising: identifying a plurality of instances of an algorithm, wherein each instance of the algorithm is integrated into one or more secure capsule computing frameworks, wherein the one or more secure capsule computing frameworks serve each instance of the algorithm to training data assets within one or more data storage structures of one or more data hosts in a secure manner that preserves privacy of the training data assets and each instance of the algorithm; executing, by a data processing system, a federated training workflow on each instance of the algorithm, wherein the federated training workflow takes as input the training data assets, maps features of the training data assets to a target inference using parameters, computes a loss or error function, updates the parameters to learned parameters in order to minimize the loss or error function, and outputs one or more trained instances of the algorithm; integrating, by the data processing system, the learned parameters for each trained instance of the algorithm into a fully federated algorithm, wherein the integrating comprises aggregating the learned parameters to obtain aggregated parameters and updating learned parameters of the fully federated algorithm with the aggregated parameters; executing, by the data processing system, a testing workflow on the fully federated algorithm, wherein the testing workflow takes as input testing data, finds patterns in the testing data using the updated learned parameters, and outputs an inference; calculating, by the data processing system, performance of the fully federated algorithm in providing the inference; determining, by the data processing system, whether the performance of the fully federated algorithm satisfies an algorithm termination criteria; when the performance of the fully federated algorithm does not satisfy the algorithm termination criteria, replacing, by the data processing system, each instance of the algorithm with the fully federated algorithm and re-executing the federated training workflow on each instance of the fully federated algorithm; and when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, providing, by the data processing system, the performance of the fully federated algorithm and the aggregated parameters to an algorithm developer of the algorithm. 2. The method of claim 1 , wherein the identifying the plurality of instances of the algorithm, comprises: receiving, at the data processing system, the algorithm and input data requirements associated with the algorithm, wherein the input data requirements include optimization and/or validation selection criteria for data assets to be run on the algorithm; identifying, by the data processing system, the data assets as being available from the one or more data hosts based on the optimization and/or validation selection criteria for the data assets; curating, by the data processing system, the data assets within a data storage structure that is within infrastructure of each data host of the one or more data hosts; and splitting at least a portion of the data assets into the training data assets within the data storage structure that is within the infrastructure of each data host of the one or more data hosts. 3. The method of claim 2 , wherein the algorithm and input data requirements are received from the algorithm developer, which is a different entity from the one or more data hosts, and the optimization and/or validation selection criteria define characteristics, formats and requirements for data assets to be run on the algorithm. 4. The method of claim 1 , wherein the federated training workflow further comprises encrypting the training gradients, and the integrating comprises decrypting the training gradients. 5. The method of claim 1 , further comprising: when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, transmitting, by the data processing system, aggregated parameters to each instance of the algorithm; and executing, by the data processing system, an update training workflow on each instance of the algorithm, wherein the update training workflow updates the learned parameters with the aggregated parameters, and outputs one or more updated and trained instances of the algorithm. 6. The method of claim 5 , further comprising running, by the data processing system, a remainder of the data assets through each instance of the algorithm. 7. The method of claim 5 , wherein the running the data assets through each instance of the algorithm comprises executing a validation workflow that includes: further splitting at least a portion of the data assets into one or more sets of validation data, running the one or more sets of validation data through each instance of the algorithm, and computing performance of each instance of the algorithm based on the running of the one or more sets of validation data. 8. A system comprising: one or more data processors; and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform actions comprising: identifying a plurality of instances of an algorithm, wherein each instance of the algorithm is integrated into one or more secure capsule computing frameworks, wherein the one or more secure capsule computing frameworks serve each instance of the algorithm to training data assets within one or more data storage structures of one or more data hosts in a secure manner that preserves privacy of the training data assets and each instance of the algorithm; executing a federated training workflow on each instance of the algorithm, wherein the federated training workflow takes as input the training data assets, maps features of the training data assets to a target inference using parameters, computes a loss or error function, updates the parameters to learned parameters in order to minimize the loss or error function, and outputs one or more trained instances of the algorithm; integrating, by the data processing system, the learned parameters for each trained instance of the algorithm into a fully federated algorithm, wherein the integrating comprises aggregating the learned parameters to obtain aggregated parameters and updating learned parameters of the fully federated algorithm with the aggregated parameters; executing, by the data processing system, a testing workflow on the fully federated algorithm, wherein the testing workflow takes as input testing data, finds patterns in the testing data using the updated learned parameters, and outputs an inference; calculating, by the data processing system, performance of the fully federated algorithm in providing the inference; determining, by the data processing system, whether the performance of the fully federated algorithm satisfies an algorithm termination criteria; when the performance of the fully federated algorithm does not satisfy the algorithm termination criteria, replacing, by the data processing system, each instance of the algorithm with the fully federated algorithm and re-executing the federated training workflow on each instance of the fully federated algorithm; and when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, providing, by the data processing system, the performance of the fully federated algorithm and the aggregated parameters to an algorithm developer of the algorithm. 9. The system of claim 8 , wherein the identifying the plurality of instances of the algorithm, comprises: receiving, at the data processing system, the algorithm and input data r
Protecting personal data, e.g. for financial or medical purposes · CPC title
in federated or virtual databases · CPC title
Machine learning · CPC title
Knowledge representation; Symbolic representation · CPC title
by executing in a restricted environment, e.g. sandbox or secure virtual machine · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.