Method for imputing missed data in sensor data sequence with missing data

US11113337B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11113337-B2
Application numberUS-201715698911-A
CountryUS
Kind codeB2
Filing dateSep 8, 2017
Priority dateSep 8, 2016
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments herein provide a method for imputing sensor data, in a sensor data sequence with missing data based on the semantics learning, where semantics is defined by the constraints of the sensor data features. A candidate value for imputation is determined based on sensor data of corresponding instances of time instants of the sensor data sequence using learning based on semantics of features of the sensor data sequence with missing data. The nearest neighbors search has been applied in similar response data sequence using the data values corresponding to the time instant of missing data in sensor data sequence. In case similar response data sequence is not available imputation is performed based on the distribution pattern of missing data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of imputing missing sensor data values in a sensor data sequence, the method comprising: generating the sensor data sequence over a period of time to determine behavior of a system over the period of time; identifying an instance of time instant associated with a missing sensor data value associated with the sensor data sequence; identifying a similar response data sequence corresponding to the sensor data sequence with the missing sensor data value, wherein the similar response data sequence is identified based on highest correlation with the sensor data sequence with the missing sensor data value; identifying a data value at an instance of time instant associated with the similar response data sequence corresponding to the instance of time instant associated with the missing sensor data value associated with the sensor data sequence; determining nearest neighbors of the identified data value in the similar response data sequence, by searching neighbors within a predetermined nearness threshold; determining instances of time instants of the determined nearest neighbors in the similar response data sequence; identifying instances of time instants in the sensor data sequence with the missing sensor data value corresponding to the instances of time instants of the determined nearest neighbors in the similar response data sequence; determining a candidate value for imputing, based on sensor data values of the identified instances of time instants in the sensor data sequence through learning based on semantics of features of the sensor data values in the sensor data sequence; imputing the candidate value, in the instance of time instant associated with the missing sensor data value associated with the sensor data sequence using the sensor data values of the identified instances of time instants in the sensor data sequence; and determining the behavior of the system using the sensor data sequence with the imputed candidate value. 2. The method of claim 1 , wherein the nearest neighbors in the similar response data sequence are determined based on a distance of data values between the nearest neighbors in the similar response data sequence and the identified data value at the instance of time instant associated with the similar response data sequence. 3. The method of claim 2 , wherein a number of nearest neighbors is below a predetermined nearness threshold. 4. The method of claim 1 , wherein the candidate value is determined through the learning based on semantics of features of sensor data values, wherein the semantics are defined based on constraint imposed on the features of the candidate value as well as the sensor data sequences. 5. The method of claim 4 , wherein the candidate value lies within a maximum possible value and a minimum possible value associated with the sensor data values of the sensor data sequence with the missing sensor data value, and a range associated with the maximum possible value and the minimum possible value. 6. The method of claim 4 , wherein the features of the candidate value, for imputing, corresponds to a distribution pattern and the constraint imposed on the distribution pattern. 7. The method of claim 6 , wherein at least three consecutive sensor data values of the sensor data sequence does not have a minimum sensor data value of the sensor data sequence in between. 8. The method of claim 6 , wherein the imputing of the candidate value comprises: determining a measure of convergence of the candidate value, associated with an instance of time instant, with the sensor data values of the preceding and succeeding instances of time instants of missed value; and imputing the candidate value in the instance of time instant associated with the sensor data sequence with the missing sensor data value. 9. The method of claim 6 , wherein the imputing of the candidate value comprises: determining whether the candidate value is identical to a minimum sensor data value of the sensor data sequence with the missing sensor data value; computing a best fit threshold by computing an average of at least the sensor data value of preceding instance of time instant, corresponding to the instance of time instant of the missed value, and the sensor data value of succeeding instance of time instant, corresponding to the time instance of the missed value, in response to determining that the candidate value for imputing is identical to the minimum sensor data value of the sensor data sequence with the missing sensor data value; determining the candidate value, wherein the candidate value is the average of the sensor data value of the preceding instance of time instant and the sensor data value of the succeeding instance of time instant; and imputing the candidate value at the instance of time instant associated with the missing data value in the sensor data sequence. 10. A computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium where the computer executable program code when executed causing the actions which include: generating the sensor data sequence over a period of time to determine behavior of a system over the period of time; identifying an instance of time instant associated with a missing sensor data value associated with the sensor data sequence; identifying a similar response data sequence corresponding to the sensor data sequence with the missing sensor data value, wherein the similar response data sequence is identified based on highest correlation with the sensor data sequence with the missing sensor data value; identifying a data value at an instance of time instant associated with the similar response data sequence corresponding to the instance of time instant associated with the missing sensor data value associated with the sensor data sequence; determining nearest neighbors of the identified data value in the similar response data sequence, by searching neighbors within a predetermined nearness threshold; determining instances of time instants of the determined nearest neighbors in the similar response data sequence; identifying instances of time instants in sensor data sequence with the missing sensor data value corresponding to the instances of time instants of the determined nearest neighbors in the similar response data sequence; determining a candidate value for imputing, based on sensor data values of the identified instances of time instants in the sensor data sequence through learning based on semantics of features of the sensor data values in the sensor data sequence; imputing the candidate value, in the instance of time instant associated with the missing sensor data value associated with the sensor data sequence using the sensor data values of the identified instances of time instants in the sensor data sequence; determining the behavior of the system using the sensor data sequence with imputed candidate value.

Assignees

Inventors

Classifications

  • Knowledge engineering; Knowledge acquisition · CPC title

  • using digital techniques · CPC title

  • Digital input using the sampling of an analogue quantity at regular intervals of time {, input from a/d converter or output to d/a converter} · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • by searching ordered data, e.g. alpha-numerically ordered data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113337B2 cover?
Embodiments herein provide a method for imputing sensor data, in a sensor data sequence with missing data based on the semantics learning, where semantics is defined by the constraints of the sensor data features. A candidate value for imputation is determined based on sensor data of corresponding instances of time instants of the sensor data sequence using learning based on semantics of featur…
Who is the assignee on this patent?
Indian Inst Technology Bombay, Tata Consultancy Services
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).