Systems and techniques for predictive data analytics
US-9489630-B2 · Nov 8, 2016 · US
US10409789B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10409789-B2 |
| Application number | US-201715707500-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 18, 2017 |
| Priority date | Sep 16, 2016 |
| Publication date | Sep 10, 2019 |
| Grant date | Sep 10, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described is an approach that provides an adaptive solution to missing data for machine learning systems. A gradient solution is provided that is attentive to imputation needs at each of several missingness levels. This multilevel approach treats data missingness at any of multiple severity levels while utilizing, as much as possible, the actual observed data.
Opening claim text (preview).
What is claimed is: 1. A method for imputing data for a learning system, comprising: collecting data from a monitored target system; determining one or more levels of missingness for the data collected from the monitored target system; selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system; imputing missing data using the selected imputation technique to generate training data; and performing model training with the training data. 2. The method of claim 1 , wherein the one or more levels of missingness for the data comprise a first factor corresponding to an overall degree of missingness for the data, a second factor corresponding to one or more degrees of missingness for individual signals within a dataset, and a third factor corresponding to missingness degrees for different signal patterns in the data. 3. The method of claim 1 , wherein the plurality of imputation techniques comprises some or all of a first imputation technique that performs expectation maximization to impute the missing data at a first level of missingness, a second imputation technique that performs the expectation maximization with external data at a second level of missingness, a third imputation technique that generates the training data using predicted values from a predictive model at a third level of missingness, or a fourth imputation technique that performs simulation to generate the training data at a fourth level of missingness. 4. The method of claim 1 , wherein a second imputation technique is selected to impute the missing data when a first imputation technique does not successfully generate the missing data. 5. The method of claim 1 , wherein the model training generates a predictive model that is employed for health monitoring of a database system. 6. A system for imputing data for a machine learning system, comprising: a processor; a memory for holding programmable code; and wherein the programmable code includes instructions for collecting data from a monitored target system; determining one or more levels of missingness for the data collected from the monitored target system; selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system; imputing missing data using the selected imputation technique to generate training data; and performing model training with the training data. 7. The system of claim 6 , wherein the one or more levels of missingness for the data comprise a first factor corresponding to an overall degree of missingness for the data, a second factor corresponding to one or more degrees of missingness for individual signals within a dataset, and a third factor corresponding to missingness degrees for different signal patterns in the data. 8. The system of claim 6 , wherein the plurality of imputation techniques comprises some or all of a first imputation technique that performs expectation maximization to impute the missing data at a first level of missingness, a second imputation technique that performs the expectation maximization with external data at a second level of missingness, a third imputation technique that generates the training data using predicted values from a predictive model at a third level of missingness, or a fourth imputation technique that performs simulation to generate the training data at a fourth level of missingness. 9. The system of claim 6 , wherein a second imputation technique is selected to impute the missing data when a first imputation technique does not successfully generate the missing data. 10. The system of claim 6 , wherein the model training generates a predictive model that is employed for health monitoring of a database system. 11. A computer program product embodied on a non-transitory computer readable medium, the non-transitory computer readable medium having stored thereon a sequence of instructions which, when executed by a processor, executes a method comprising: collecting data from a monitored target system; determining one or more levels of missingness for the data collected from the monitored target system; selecting, from among a plurality of imputation techniques, a selected imputation technique based at least in part upon the one or more levels of missingness for the data, wherein expectation maximization (EM) is selected as the selected imputation technique if it is determined that both an overall level of missing data and individual levels of missing data for signals are at one or more designated thresholds, and an external data source is accessed to generate an EM seed for the expectation maximization when insufficient seed data exists within the data collected from the monitored target system; imputing missing data using the selected imputation technique to generate training data; and performing model training with the training data. 12. The computer program product of claim 11 , wherein the one or more levels of missingness for the data comprise a first factor corresponding to an overall degree of missingness for the data, a second factor corresponding to one or more degrees of missingness for individual signals within a dataset, and a third factor corresponding to missingness degrees for different signal patterns in the data. 13. The computer program product of claim 11 , wherein the plurality of imputation techniques comprises some or all of a first imputation technique that performs expectation maximization to impute the missing data at a first level of missingness, a second imputation technique that performs the expectation maximization with external data at a second level of missingness, a third imputation technique that generates the training data using predicted values from a predictive model at a third level of missingness, or a fourth imputation technique that performs simulation to generate the training data at a fourth level of missingness. 14. The computer program product of claim 11 , wherein a second imputation technique is selected to impute the missing data when a first imputation technique does not successfully generate the missing data. 15. The computer program product of claim 11 , wherein the model training generates a predictive model that is employed for health monitoring of a database system.
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Ensemble learning · CPC title
using statistical or mathematical methods · CPC title
characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability (for optimising operational conditions of wireless networks H04W24/02) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.