Data set scoring
US-10339147-B1 · Jul 2, 2019 · US
US11989260B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11989260-B2 |
| Application number | US-202117363871-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 30, 2021 |
| Priority date | Jun 30, 2021 |
| Publication date | May 21, 2024 |
| Grant date | May 21, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data sharing system for sharing datasets of data providers to data consumers and transferring incentives from the data consumers to the data providers in response to the data-sharing. The system includes a multi-angle alliance guided data valuation module for fair allocation of the incentives between the data consumers. The system also includes a flexible-scenario routed dataset comparison module for evaluating the data provided by the data providers via one of a plurality of evaluating routes. The system provides enhanced use of computer cloud and enables both data alliance and growing capacity of artificial intelligence (AI) supermodels for sustainable data sharing. Moreover, the system uses coreset based Shapley valuation method for efficient data valuation.
Opening claim text (preview).
What is claimed is: 1. A data sharing system for sharing data from one or more data providers to one or more data consumers, the data comprising one or more input datasets each provided by a respective one of the one or more data providers, the data sharing system comprising: one or more processing structures; and memory storing instructions which, when executed by the one or more processing structures, cause the one or more processing structures to perform actions comprising: obtaining one or more training datasets from the one or more input datasets, each of the one or more training datasets corresponding to a respective one of the one or more input datasets; evaluating the one or more training datasets for generating one or more quality scores, each quality score associated with a respective one of the one or more training datasets; generating a unit value for each of the one or more input datasets based on the one or more quality scores; receiving incentives from the one or more data consumers for acquiring at least a portion of the input datasets; distributing the received incentives to the one or more data providers based on the one or more unit values and the at least portion of the input datasets; and sharing the at least portion of the input datasets with the one or more data consumers; wherein said evaluating the one or more training datasets comprises: evaluating the one or more training datasets using a first evaluation method comprising: training an artificial intelligence AI model using the one or more training datasets and a machine learning algorithm to obtain one or more first trained models, and generating each of the one or more quality scores based on one or more first predictions generated by a corresponding one of the one or more first trained models using one or more test datasets received from the one or more data consumers. 2. The data sharing system of claim 1 , wherein said generating each of the one or more quality scores comprises: calculating one or more first performance metrics based on the one or more first predictions generated by the corresponding one of the one or more first trained models using the one or more test datasets, each of the one or more first performance metrics corresponding to a respective one of the one or more test datasets; calculating a first score for the corresponding one of the one or more first trained models, the first score being a weighted summation of the one or more first performance metrics; and calculating the one or more quality scores using at least one or more first scores. 3. The data sharing system of claim 1 , wherein said evaluating the one or more training datasets comprises: aggregating different subsets of the one or more training datasets to form a plurality of aggregated training datasets; training the AI model using the aggregated training datasets to obtain a plurality of second trained models; and generating each of the one or more quality scores based on the one or more first predictions generated by the corresponding one of the one or more first trained models using the one or more test datasets and a plurality of second predictions generated by the second trained models using the one or more test datasets. 4. The data sharing system of claim 3 , wherein said generating each of the one or more quality scores comprises: calculating one or more first performance metrics based on the one or more first predictions generated by the corresponding one of the one or more first trained models using the one or more test datasets, each of the one or more first performance metrics corresponding to a respective one of the one or more test datasets; calculating a first score for the corresponding one of the one or more first trained models, the first score being a weighted summation of the one or more first performance metrics; calculating a plurality of second performance metrics based on the second predictions generated by the second trained models using the one or more test datasets, for each of the second trained models, each of the second performance metrics corresponding to a respective one of the one or more test datasets; combining the second performance metrics to produce a second score for each of the one or more training datasets; and calculating each of the one or more quality scores as weighted summation of a corresponding one of the first score and a corresponding one of the second score. 5. The data sharing system of claim 4 , wherein said calculating the plurality of second performance metrics comprises: calculating the plurality of second performance metrics using a Shapley value method based on the second predictions generated by the second trained models using the one or more test datasets, for each of the second trained models. 6. The data sharing system of claim 1 , wherein said generating the unit value for each of the one or more input datasets based on the one or more quality scores comprises: ranking the one or more quality scores; and producing the one or more unit values for the one or more input datasets based on the ranking. 7. The data sharing system of claim 1 , wherein the instructions, when executed by the one or more processing structures, cause the one or more processing structures to perform further actions comprising: receiving one or more raw input datasets from the one or more data providers; and wherein said obtaining the one or more training datasets from the one or more input datasets comprises: filtering the one or more raw input datasets to obtain the one or more training datasets. 8. The data sharing system of claim 1 , wherein said obtaining the one or more training datasets from the one or more input datasets comprises: constructing a coreset from each of the one or more input datasets to obtain the one or more training datasets. 9. The data sharing system of claim 8 , wherein said constructing the coreset from each of the one or more input datasets comprises: constructing the coreset from each of the one or more input datasets using a herding method. 10. The data sharing system of claim 1 , wherein said evaluating the one or more training datasets comprises: evaluating the one or more training datasets using a plurality of evaluation methods, the plurality of evaluation methods comprising the first evaluation method; and wherein the instructions, when executed by the one or more processing structures, cause the one or more processing structures to perform further actions comprising: selecting the first evaluation method when an input from the one or more data consumers comprises one or more task definitions associated with a target task and the one or more test datasets are associated with the target task. 11. The data sharing system of claim 10 , wherein the plurality of evaluation methods comprise a second evaluation method, the second evaluation method comprising: an automated clustering function for estimating clusterability of the one or more training datasets, clustering the one or more training datasets, and estimating a number of clusters, and a clustering evaluation function for computing clustering outcome metrics to measure intra-class and inter-class relationships of the clusters and to generate the one or more quality scores; wherein the instructions, when executed by the one or more processing structures, cause the one or more processing structures to perform further actions comprising: selecting the second evaluation method when the input from the one or more data consumers comprises no task definitions. 12. The data sharing system of claim 11 , wherein the intra-class and inter-class r
Business processes related to social networking or social networking services · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
Clustering techniques · CPC title
Machine learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.