Distributed privacy-preserving computing on protected data

US12001965B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12001965-B2
Application numberUS-202318335053-A
CountryUS
Kind codeB2
Filing dateJun 14, 2023
Priority dateMar 26, 2019
Publication dateJun 4, 2024
Grant dateJun 4, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a data host based on the input data requirements, curating the data assets within a data storage structure that is within infrastructure of the data host, and integrating the algorithm into a secure capsule computing framework. The secure capsule computing framework serves the algorithm to the data assets within the data storage structure in a secure manner that preserves privacy of the data assets and the algorithm. The computer implemented method further includes running the data assets through the algorithm to obtain an inference.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: identifying an algorithm, wherein the algorithm is provided by an algorithm developer and integrated into a secure capsule computing framework, wherein the secure capsule computing framework serves the algorithm to validation data assets within a data storage structure in a secure manner that preserves privacy of the validation data assets and the algorithm; executing, by a data processing system, a validation workflow on the algorithm, wherein the validation workflow takes as input the validation data assets, applies the algorithm to the validation data assets using learned parameters, and outputs an inference; calculating, by the data processing system, performance of the algorithm in providing the inference, wherein the performance is calculated based on gold standard labels; determining, by the data processing system, whether the performance of the algorithm satisfies validation criteria defined by an algorithm developer; when the performance of the algorithm does not satisfy the validation criteria, optimizing, with the data processing system, one or more hyperparameters of the algorithm and re-executing the validation workflow on the algorithm with the optimized one or more hyperparameters; and when the performance of the algorithm does satisfy the validation criteria, providing, by the data processing system, the performance of the algorithm and the one or more hyperparameters to the algorithm developer. 2. The method of claim 1 , wherein the identifying the algorithm comprises: receiving, at the data processing system, the algorithm and input data requirements associated with the algorithm, wherein the input data requirements include validation selection criteria for data assets to be run on the algorithm; identifying, by the data processing system, the data assets as being available from a data host based on the validation selection criteria for the data assets; curating, by the data processing system, the data assets within a data storage structure that is within infrastructure of the data host; and splitting at least a portion of the data assets into the validation data assets within the data storage structure that is within the infrastructure of the data host. 3. The method of claim 2 , wherein the validation selection criteria includes clinical cohort criteria, demographic criteria, and/or data set class balance, and wherein the clinical cohort criteria define a group of people that the data assets are to be obtained from for a cohort study, a type of the cohort study, risk factors that the group of people may have exposure to over a period of time, question or hypothesis to be solved and associated disease or condition, other parameters that define criteria for the cohort study, or any combination thereof. 4. The method of claim 2 , further comprising: onboarding, by the data processing system, the data host, wherein the onboarding comprises confirming that the use of the data assets with the algorithm is in compliance with data privacy requirements; and completing governance and compliance requirements including clearance from an institutional review board use of the data assets from the data host for purposes of validating the algorithm, wherein the curating comprises selecting the data storage structure from multiple data storage structures and provisioning the data storage structure within the infrastructure of the data host, wherein the selection of the data storage structure is based on a type of algorithm within the algorithm, a type of data within the data assets, system requirements of the data processing system, or a combination thereof. 5. The method of claim 1 , further comprising when the performance of the algorithm does satisfy the validation criteria, maintaining, by the data processing system, the algorithm and the validation data assets in a secure manner that preserves privacy of the validation data assets and the algorithm. 6. The method of claim 5 , wherein the validation data assets are a plurality of disjoint sets of data assets, the encrypted code is signed by the data processing system and stored in a data storage archive, and the performance of the algorithm is provided as a single validation report for validation of the algorithm aggregated from a plurality of validations performed on the plurality of disjoint sets of data assets. 7. The method of claim 1 , wherein the secure capsule computing framework is provisioned within a computing infrastructure configured to accept encrypted code required to run the algorithm, and wherein the provisioning the computing infrastructure comprises instantiating the secure capsule computing framework on the computing infrastructure, depositing, by the algorithm developer, the encrypted code inside the secure capsule computing framework, and once the secure capsule computing framework is instantiated, decrypting the encrypted code. 8. A system comprising: one or more data processors; and a non-transitory computer readable storage medium containing instructions which, when executed on the one or more data processors, cause the one or more data processors to perform actions including: identifying an algorithm, wherein the algorithm is provided by an algorithm developer and integrated into a secure capsule computing framework, wherein the secure capsule computing framework serves the algorithm to validation data assets within a data storage structure in a secure manner that preserves privacy of the validation data assets and the algorithm; executing a validation workflow on the algorithm, wherein the validation workflow takes as input the validation data assets, finds patterns in the validation data assets using learned parameters, and outputs an inference; calculating performance of the algorithm in providing the inference, wherein the performance is calculated based on gold standard labels; determining whether the performance of the algorithm satisfies validation criteria defined by an algorithm developer; when the performance of the algorithm does not satisfy the validation criteria, optimizing one or more hyperparameters of the algorithm and re-executing the validation workflow on the algorithm with the optimized one or more hyperparameters; and when the performance of the algorithm does satisfy the validation criteria, providing the performance of the algorithm and the one or more hyperparameters to the algorithm developer. 9. The system of claim 8 , wherein the identifying the algorithm comprises: receiving, at the data processing system, the algorithm and input data requirements associated with the algorithm, wherein the input data requirements include validation selection criteria for data assets to be run on the algorithm; identifying, by the data processing system, the data assets as being available from a data host based on the validation selection criteria for the data assets; curating, by the data processing system, the data assets within a data storage structure that is within infrastructure of the data host; and splitting at least a portion of the data assets into the validation data assets within the data storage structure that is within the infrastructure of the data host. 10. The system of claim 9 , wherein the validation selection criteria includes clinical cohort criteria, demographic criteria, and/or data set class balance, and wherein the clinical cohort criteria define a group of people that the data assets are to be obtained from for a cohort study, a type of the cohort study, risk factors that the group of people may have exposure to over a period of time, question or hypothesis to be solved and associated disease or condition, other parameters that define

Assignees

Inventors

Classifications

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • G06F16/256Primary

    in federated or virtual databases · CPC title

  • Machine learning · CPC title

  • G06N5/02Primary

    Knowledge representation; Symbolic representation · CPC title

  • by executing in a restricted environment, e.g. sandbox or secure virtual machine · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12001965B2 cover?
The present disclosure relates to techniques for developing artificial intelligence algorithms by distributing analytics to multiple sources of privacy protected, harmonized data. Particularly, aspects are directed to a computer implemented method that includes receiving an algorithm and input data requirements associated with the algorithm, identifying data assets as being available from a dat…
Who is the assignee on this patent?
Univ California
What technology area does this patent fall under?
Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).