Utilizing machine learning models to identify insights in a document
US-10303771-B1 · May 28, 2019 · US
US12175308B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12175308-B2 |
| Application number | US-202418402937-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 3, 2024 |
| Priority date | Jul 6, 2018 |
| Publication date | Dec 24, 2024 |
| Grant date | Dec 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An exemplary system, method, and computer-accessible medium can include, for example, receiving an original dataset(s), receiving a synthetic dataset(s), training a model(s) using the original dataset(s) and the synthetic dataset(s), and evaluating the synthetic dataset(s) based on the training of the model(s). The model(s) can include a first model and a second model, and the first model can be trained using the original dataset(s) and the second model can be trained using the synthetic dataset(s). The synthetic dataset(s) can be evaluated by comparing first results from the training of the first model to second results from the training of the second model.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-accessible medium having stored thereon computer-executable instructions for evaluating a synthetic dataset, wherein, when a computer hardware arrangement executes the instructions, the computer hardware arrangement is configured to perform procedures comprising: training a model using an original dataset and a synthetic dataset; determining a data similarity score including a combined score of exact-match overlap score and fuzzy-match overlap score based on the synthetic dataset and the original dataset; determining a data quality score including a combined score of row-duplicate score, repeated-value score and schema-preservation score based on the synthetic dataset and the original dataset; evaluating the synthetic dataset based on the training of the model, the data similarity score, and the data quality score; determining a region for the synthetic dataset based on evaluating the synthetic dataset, wherein the region defines a status of the synthetic dataset; and generating a suggestion based on the determined region for building predicative models on the synthetic dataset, wherein the suggestion includes at least one of (a) indicating that the at least one synthetic dataset is adequate or (b) warning that the at least one synthetic dataset potentially contains information similar to the at least one original dataset. 2. The non-transitory computer-accessible medium 1 , wherein the procedures further comprises generating a statistical correlation score based on the synthetic dataset and the original dataset. 3. The non-transitory computer-accessible medium 2 , wherein the procedures further comprises generating a univariate distribution score based on the synthetic dataset and the original dataset. 4. The non-transitory computer-accessible medium 3 , wherein the procedures further comprises evaluating the synthetic dataset further based on the statistical correlation score and the univariate distribution score. 5. The non-transitory computer-accessible medium 1 , wherein the suggestion provides information for a data application. 6. The non-transitory computer-accessible medium 1 , wherein the model comprises a behavior classification model. 7. The non-transitory computer-accessible medium 1 , wherein: training a model comprises training a first model and training a second model, the first model is trained using the original dataset, and the second model is trained using the synthetic dataset. 8. The non-transitory computer-accessible medium 7 , wherein the procedures further comprise evaluating the synthetic dataset by comparing first results from the training of the first model to second results from the training of the second model. 9. The non-transitory computer-accessible medium 8 , wherein the first results are compared to the second results using an analysis of variance procedure. 10. The non-transitory computer-accessible medium 72 , wherein: the analysis of variance procedure comprises a degrees of freedom divisor and a sum of squares summation, the analysis of variance procedure results in a mean square, and the means square comprises square terms as deviations from a sample mean. 11. The non-transitory computer-accessible medium 72 , wherein: the analysis of variance procedure estimates at least one of (a) a total variance based on all the observation deviations from a grand mean, (ii) an error variance based on all the observation deviations from their appropriate treatment means, or (iii) a treatment variance, and the treatment variance is based on deviations of a treatment means from the grand mean multiplied by a number of observations in each treatment. 12. A system comprising a computer hardware arrangement, wherein the computer hardware arrangement is configured to: train a model using an original dataset and a synthetic dataset; determine a data similarity score including a combined score of exact-match overlap score and fuzzy-match overlap score based on the synthetic dataset and the original dataset; evaluate the synthetic dataset based on the training of the model and the data similarity; determine a region for the synthetic dataset based on evaluating the synthetic dataset, wherein the region defines a status of the synthetic dataset; and generate a suggestion based on the determined region for building predicative models on the synthetic dataset, wherein the suggestion includes at least one of (a) indicating that the at least one synthetic dataset is adequate or (b) warning that the at least one synthetic dataset potentially contains information similar to the at least one original dataset. 13. The system of claim 12 , wherein the computer hardware arrangement is further configured to: determine a data quality score including a combined score of row-duplicate score, repeated-value score and schema-preservation score based on the synthetic dataset and the original dataset; and evaluate the synthetic dataset further based on the data quality score. 14. The system of claim 12 , wherein the computer hardware arrangement is further configured to generate a further synthetic dataset based on the synthetic dataset and the evaluation of the synthetic dataset. 15. The system of claim 14 , wherein the computer hardware arrangement is further configured to train the second model based on the at least one further synthetic dataset, and evaluating the at least one further synthetic dataset based on the training of the at least one second model on the at least one further synthetic dataset. 16. A method performed by a computer hardware arrangement, the method comprising: training a model using an original dataset and a synthetic dataset; determining a data quality score including a combined score of row-duplicate score, repeated-value score and schema-preservation score based on the synthetic dataset and the original dataset; evaluating the synthetic dataset based on the training of the model and the data quality score; determining a region for the synthetic dataset based on evaluating the synthetic dataset, wherein the region defines a status of the synthetic dataset; and generating a suggestion based on the determined region for building predicative models on the synthetic dataset, wherein the suggestion includes at least one of (a) indicating that the at least one synthetic dataset is adequate or (b) warning that the at least one synthetic dataset potentially contains information similar to the at least one original dataset. 17. The method of claim 16 , further comprising: determining a data similarity score including a combined score of exact-match overlap score and fuzzy-match overlap score based on the synthetic dataset and the original dataset; and evaluating the synthetic dataset further based on the data similarity score. 18. The method of claim 16 , wherein: training a model comprises training a first model and training a second model, wherein the first model is trained using the original dataset, and the second model is trained using the synthetic dataset; and the method further comprises evaluating the synthetic dataset by comparing first results from the training of the first model to second results from the training of the second model, wherein the comparison of first results to the second results uses a threshold procedure comprising determining a further statistical correlation based on a plurality of covariance matrices. 19. The method of claim 18 , wherein the threshold procedure further comprises summing first
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Supervised learning · CPC title
Adversarial learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.