Missing value imputation technique to facilitate prognostic analysis of time-series sensor data
US-2019378022-A1 · Dec 12, 2019 · US
US11113337B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11113337-B2 |
| Application number | US-201715698911-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2017 |
| Priority date | Sep 8, 2016 |
| Publication date | Sep 7, 2021 |
| Grant date | Sep 7, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments herein provide a method for imputing sensor data, in a sensor data sequence with missing data based on the semantics learning, where semantics is defined by the constraints of the sensor data features. A candidate value for imputation is determined based on sensor data of corresponding instances of time instants of the sensor data sequence using learning based on semantics of features of the sensor data sequence with missing data. The nearest neighbors search has been applied in similar response data sequence using the data values corresponding to the time instant of missing data in sensor data sequence. In case similar response data sequence is not available imputation is performed based on the distribution pattern of missing data.
Opening claim text (preview).
What is claimed is: 1. A method of imputing missing sensor data values in a sensor data sequence, the method comprising: generating the sensor data sequence over a period of time to determine behavior of a system over the period of time; identifying an instance of time instant associated with a missing sensor data value associated with the sensor data sequence; identifying a similar response data sequence corresponding to the sensor data sequence with the missing sensor data value, wherein the similar response data sequence is identified based on highest correlation with the sensor data sequence with the missing sensor data value; identifying a data value at an instance of time instant associated with the similar response data sequence corresponding to the instance of time instant associated with the missing sensor data value associated with the sensor data sequence; determining nearest neighbors of the identified data value in the similar response data sequence, by searching neighbors within a predetermined nearness threshold; determining instances of time instants of the determined nearest neighbors in the similar response data sequence; identifying instances of time instants in the sensor data sequence with the missing sensor data value corresponding to the instances of time instants of the determined nearest neighbors in the similar response data sequence; determining a candidate value for imputing, based on sensor data values of the identified instances of time instants in the sensor data sequence through learning based on semantics of features of the sensor data values in the sensor data sequence; imputing the candidate value, in the instance of time instant associated with the missing sensor data value associated with the sensor data sequence using the sensor data values of the identified instances of time instants in the sensor data sequence; and determining the behavior of the system using the sensor data sequence with the imputed candidate value. 2. The method of claim 1 , wherein the nearest neighbors in the similar response data sequence are determined based on a distance of data values between the nearest neighbors in the similar response data sequence and the identified data value at the instance of time instant associated with the similar response data sequence. 3. The method of claim 2 , wherein a number of nearest neighbors is below a predetermined nearness threshold. 4. The method of claim 1 , wherein the candidate value is determined through the learning based on semantics of features of sensor data values, wherein the semantics are defined based on constraint imposed on the features of the candidate value as well as the sensor data sequences. 5. The method of claim 4 , wherein the candidate value lies within a maximum possible value and a minimum possible value associated with the sensor data values of the sensor data sequence with the missing sensor data value, and a range associated with the maximum possible value and the minimum possible value. 6. The method of claim 4 , wherein the features of the candidate value, for imputing, corresponds to a distribution pattern and the constraint imposed on the distribution pattern. 7. The method of claim 6 , wherein at least three consecutive sensor data values of the sensor data sequence does not have a minimum sensor data value of the sensor data sequence in between. 8. The method of claim 6 , wherein the imputing of the candidate value comprises: determining a measure of convergence of the candidate value, associated with an instance of time instant, with the sensor data values of the preceding and succeeding instances of time instants of missed value; and imputing the candidate value in the instance of time instant associated with the sensor data sequence with the missing sensor data value. 9. The method of claim 6 , wherein the imputing of the candidate value comprises: determining whether the candidate value is identical to a minimum sensor data value of the sensor data sequence with the missing sensor data value; computing a best fit threshold by computing an average of at least the sensor data value of preceding instance of time instant, corresponding to the instance of time instant of the missed value, and the sensor data value of succeeding instance of time instant, corresponding to the time instance of the missed value, in response to determining that the candidate value for imputing is identical to the minimum sensor data value of the sensor data sequence with the missing sensor data value; determining the candidate value, wherein the candidate value is the average of the sensor data value of the preceding instance of time instant and the sensor data value of the succeeding instance of time instant; and imputing the candidate value at the instance of time instant associated with the missing data value in the sensor data sequence. 10. A computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium where the computer executable program code when executed causing the actions which include: generating the sensor data sequence over a period of time to determine behavior of a system over the period of time; identifying an instance of time instant associated with a missing sensor data value associated with the sensor data sequence; identifying a similar response data sequence corresponding to the sensor data sequence with the missing sensor data value, wherein the similar response data sequence is identified based on highest correlation with the sensor data sequence with the missing sensor data value; identifying a data value at an instance of time instant associated with the similar response data sequence corresponding to the instance of time instant associated with the missing sensor data value associated with the sensor data sequence; determining nearest neighbors of the identified data value in the similar response data sequence, by searching neighbors within a predetermined nearness threshold; determining instances of time instants of the determined nearest neighbors in the similar response data sequence; identifying instances of time instants in sensor data sequence with the missing sensor data value corresponding to the instances of time instants of the determined nearest neighbors in the similar response data sequence; determining a candidate value for imputing, based on sensor data values of the identified instances of time instants in the sensor data sequence through learning based on semantics of features of the sensor data values in the sensor data sequence; imputing the candidate value, in the instance of time instant associated with the missing sensor data value associated with the sensor data sequence using the sensor data values of the identified instances of time instants in the sensor data sequence; determining the behavior of the system using the sensor data sequence with imputed candidate value.
Knowledge engineering; Knowledge acquisition · CPC title
using digital techniques · CPC title
Digital input using the sampling of an analogue quantity at regular intervals of time {, input from a/d converter or output to d/a converter} · CPC title
Machine learning · CPC title
by searching ordered data, e.g. alpha-numerically ordered data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.