Utilizing machine learning models to identify insights in a document
US-10303771-B1 · May 28, 2019 · US
US11900178B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11900178-B2 |
| Application number | US-202217845786-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 21, 2022 |
| Priority date | Jul 6, 2018 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An exemplary system, method, and computer-accessible medium can include, for example, receiving an original dataset(s), receiving a synthetic dataset(s), training a model(s) using the original dataset(s) and the synthetic dataset(s), and evaluating the synthetic dataset(s) based on the training of the model(s). The model(s) can include a first model and a second model, and the first model can be trained using the original dataset(s) and the second model can be trained using the synthetic dataset(s). The synthetic dataset(s) can be evaluated by comparing first results from the training of the first model to second results from the training of the second model.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer-accessible medium having stored thereon computer-executable instructions for evaluating a synthetic dataset, wherein, when a computer hardware arrangement executes the instructions, the computer hardware arrangement is configured to perform procedures comprising: training a model using an original dataset and a synthetic dataset; generating a statistical correlation score based on the synthetic dataset and the original dataset; generating a univariate distribution score based on the synthetic dataset and the original dataset; generating an evaluation score by evaluating the synthetic dataset based on the training of the model, wherein the evaluation score includes the statistical correlation score and the univariate distribution score; determining a region for the synthetic dataset based on the evaluation score, wherein the region defines the status of the synthetic dataset; and generating a suggestion based on the evaluation score and the determined region, wherein the suggestion provides information for a data application. 2. The non-transitory computer-accessible medium of claim 1 , wherein the model comprises a behavior classification model. 3. The non-transitory computer-accessible medium of claim 1 , wherein: training a model comprises training a first model and training a second model, the first model is trained using the original dataset, and the second model is trained using the synthetic dataset. 4. The non-transitory computer-accessible medium of claim 3 , wherein the procedures further comprise evaluating the synthetic dataset by comparing first results from the training of the first model to second results from the training of the second model. 5. The non-transitory computer-accessible medium of claim 4 , wherein the first results are compared to the second results using an analysis of variance procedure. 6. The non-transitory computer-accessible medium of claim 5 , wherein: the analysis of variance procedure comprises a degrees of freedom divisor and a sum of squares summation, the analysis of variance procedure results in a mean square, and the means square comprises square terms as deviations from a sample mean. 7. The non-transitory computer-accessible medium of claim 5 , wherein the analysis of variance procedure estimates at least one of (a) a total variance based on all the observation deviations from a grand mean, (ii) an error variance based on all the observation deviations from their appropriate treatment means, or (iii) a treatment variance. 8. The non-transitory computer-accessible medium of claim 7 , wherein the treatment variance is based on deviations of a treatment means from the grand mean multiplied by a number of observations in each treatment. 9. The non-transitory computer-accessible medium of claim 4 , wherein the procedures further comprise generating a further synthetic dataset based on the synthetic dataset and the evaluation of the synthetic dataset. 10. The non-transitory computer-accessible medium of claim 9 , wherein the procedures further comprise: training the second model based on the at least one further synthetic dataset, and evaluating the at least one further synthetic dataset based on the training of the at least one second model on the at least one further synthetic dataset. 11. A system, comprising: a computer hardware arrangement configured to: train a model using an original dataset and a synthetic dataset; generate a statistical correlation score based on the synthetic dataset and the original dataset; generate a univariate distribution score based on the synthetic dataset and the original dataset; generate an evaluation score by evaluating the synthetic dataset based on the training of the model, wherein the evaluation score includes the statistical correlation score and the univariate distribution score; determine a region for the synthetic dataset based on the evaluation score, wherein the region defines the status of the synthetic dataset; and generate a suggestion based on the evaluation score and the determined region, wherein the suggestion provides information for a data application. 12. The system of claim 11 , wherein the suggestion includes at least one of (a) indicating that the at least one synthetic dataset is adequate or (b) warning that the at least one synthetic dataset potentially contains information similar to the at least one original dataset. 13. The system of claim 11 , wherein the region includes one of (i) a normal region where the synthetic dataset is unlikely to contain synthetic data that is similar to original data within the original dataset, (ii) a warning region where the synthetic dataset at least one of (a) potentially contains the synthetic data that is similar to the original data or (b) the synthetic data does not substantially match a schema of the original dataset, or (iii) a red flag region where the synthetic dataset is likely to contain the synthetic data that is similar to the original data. 14. The non-transitory computer-accessible medium of claim 1 , wherein the region includes one of (i) a normal region where the synthetic dataset is unlikely to contain synthetic data that is similar to original data within the original dataset, (ii) a warning region where the synthetic dataset at least one of (a) potentially contains the synthetic data that is similar to the original data or (b) the synthetic data does not substantially match a schema of the original dataset, or (iii) a red flag region where the synthetic dataset is likely to contain the synthetic data that is similar to the original data. 15. A method performed by a computer hardware arrangement, the method comprising: training a model using an original dataset and a synthetic dataset; generating a statistical correlation score based on the synthetic dataset and the original dataset; generating a univariate distribution score based on the synthetic dataset and the original dataset; generating an evaluation score by evaluating the synthetic dataset based on the training of the model, wherein the evaluation score includes the statistical correlation score and the univariate distribution score; determining a region for the synthetic dataset based on the evaluation score, wherein the region defines the status of the synthetic dataset; and generating a suggestion based on the evaluation score and the determined region, wherein the suggestion provides information for a data application. 16. The method of claim 15 wherein: training a model comprises training a first model and training a second model, the first model is trained using the original dataset, and the second model is trained using the synthetic dataset. 17. The method of claim 16 , wherein the method further comprises evaluating the synthetic dataset by comparing first results from the training of the first model to second results from the training of the second model. 18. The method of claim 17 , wherein the comparison of first results to the second results uses a threshold procedure comprising: summing first errors from the first results, summing second errors from the second results, and comparing the summed first errors to the summed second errors. 19. The method of claim 18 , wherein the threshold procedure includes determining a further statistical correlation based on a plurality of covariance matrices. 20. The method of claim 15 , wherein the region includes one of (i) a normal region where sy
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Supervised learning · CPC title
Adversarial learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.