Distributed privacy-preserving computing

US11748633B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11748633-B2
Application numberUS-202217988664-A
CountryUS
Kind codeB2
Filing dateNov 16, 2022
Priority dateMar 26, 2019
Publication dateSep 5, 2023
Grant dateSep 5, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a data host based on the input data requirements, curating the data assets within a data storage structure that is within infrastructure of the data host, and integrating the algorithm into a secure capsule computing framework. The secure capsule computing framework serves the algorithm to the data assets within the data storage structure in a secure manner that preserves privacy of the data assets and the algorithm. The computer implemented method further includes running the data assets through the algorithm to obtain an inference.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying a plurality of instances of an algorithm, wherein each instance of the algorithm is integrated into one or more secure capsule computing frameworks, wherein the one or more secure capsule computing frameworks serve each instance of the algorithm to training data assets within one or more data storage structures of one or more data hosts in a secure manner that preserves privacy of the training data assets and each instance of the algorithm; executing, by a data processing system, a federated training workflow on each instance of the algorithm, wherein the federated training workflow takes as input the training data assets, maps features of the training data assets to a target inference using parameters, computes a loss or error function, updates the parameters to learned parameters in order to minimize the loss or error function, and outputs one or more trained instances of the algorithm; integrating, by the data processing system, the learned parameters for each trained instance of the algorithm into a fully federated algorithm, wherein the integrating comprises aggregating the learned parameters to obtain aggregated parameters and updating learned parameters of the fully federated algorithm with the aggregated parameters; executing, by the data processing system, a testing workflow on the fully federated algorithm, wherein the testing workflow takes as input testing data, finds patterns in the testing data using the updated learned parameters, and outputs an inference; calculating, by the data processing system, performance of the fully federated algorithm in providing the inference; determining, by the data processing system, whether the performance of the fully federated algorithm satisfies an algorithm termination criteria; when the performance of the fully federated algorithm does not satisfy the algorithm termination criteria, replacing, by the data processing system, each instance of the algorithm with the fully federated algorithm and re-executing the federated training workflow on each instance of the fully federated algorithm; and when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, providing, by the data processing system, the performance of the fully federated algorithm and the aggregated parameters to an algorithm developer of the algorithm. 2. The method of claim 1 , wherein the identifying the plurality of instances of the algorithm, comprises: receiving, at the data processing system, the algorithm and input data requirements associated with the algorithm, wherein the input data requirements include optimization and/or validation selection criteria for data assets to be run on the algorithm; identifying, by the data processing system, the data assets as being available from the one or more data hosts based on the optimization and/or validation selection criteria for the data assets; curating, by the data processing system, the data assets within a data storage structure that is within infrastructure of each data host of the one or more data hosts; and splitting at least a portion of the data assets into the training data assets within the data storage structure that is within the infrastructure of each data host of the one or more data hosts. 3. The method of claim 2 , wherein the algorithm and input data requirements are received from the algorithm developer, which is a different entity from the one or more data hosts, and the optimization and/or validation selection criteria define characteristics, formats and requirements for data assets to be run on the algorithm. 4. The method of claim 1 , wherein the federated training workflow further comprises encrypting the training gradients, and the integrating comprises decrypting the training gradients. 5. The method of claim 1 , further comprising: when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, transmitting, by the data processing system, aggregated parameters to each instance of the algorithm; and executing, by the data processing system, an update training workflow on each instance of the algorithm, wherein the update training workflow updates the learned parameters with the aggregated parameters, and outputs one or more updated and trained instances of the algorithm. 6. The method of claim 5 , further comprising running, by the data processing system, a remainder of the data assets through each instance of the algorithm. 7. The method of claim 5 , wherein the running the data assets through each instance of the algorithm comprises executing a validation workflow that includes: further splitting at least a portion of the data assets into one or more sets of validation data, running the one or more sets of validation data through each instance of the algorithm, and computing performance of each instance of the algorithm based on the running of the one or more sets of validation data. 8. A system comprising: one or more data processors; and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform actions comprising: identifying a plurality of instances of an algorithm, wherein each instance of the algorithm is integrated into one or more secure capsule computing frameworks, wherein the one or more secure capsule computing frameworks serve each instance of the algorithm to training data assets within one or more data storage structures of one or more data hosts in a secure manner that preserves privacy of the training data assets and each instance of the algorithm; executing a federated training workflow on each instance of the algorithm, wherein the federated training workflow takes as input the training data assets, maps features of the training data assets to a target inference using parameters, computes a loss or error function, updates the parameters to learned parameters in order to minimize the loss or error function, and outputs one or more trained instances of the algorithm; integrating, by the data processing system, the learned parameters for each trained instance of the algorithm into a fully federated algorithm, wherein the integrating comprises aggregating the learned parameters to obtain aggregated parameters and updating learned parameters of the fully federated algorithm with the aggregated parameters; executing, by the data processing system, a testing workflow on the fully federated algorithm, wherein the testing workflow takes as input testing data, finds patterns in the testing data using the updated learned parameters, and outputs an inference; calculating, by the data processing system, performance of the fully federated algorithm in providing the inference; determining, by the data processing system, whether the performance of the fully federated algorithm satisfies an algorithm termination criteria; when the performance of the fully federated algorithm does not satisfy the algorithm termination criteria, replacing, by the data processing system, each instance of the algorithm with the fully federated algorithm and re-executing the federated training workflow on each instance of the fully federated algorithm; and when the performance of the fully federated algorithm does satisfy the algorithm termination criteria, providing, by the data processing system, the performance of the fully federated algorithm and the aggregated parameters to an algorithm developer of the algorithm. 9. The system of claim 8 , wherein the identifying the plurality of instances of the algorithm, comprises: receiving, at the data processing system, the algorithm and input data r

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • G06F16/256Primary

    in federated or virtual databases · CPC title

  • Machine learning · CPC title

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

  • by executing in a restricted environment, e.g. sandbox or secure virtual machine · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11748633B2 cover?
The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a dat…
Who is the assignee on this patent?
Univ California
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).