Automatic detection of outliers in multivariate data
US-2018046599-A1 · Feb 15, 2018 · US
US10733264B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10733264-B2 |
| Application number | US-201615184494-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 16, 2016 |
| Priority date | Jun 17, 2015 |
| Publication date | Aug 4, 2020 |
| Grant date | Aug 4, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed is a method and system for detecting outliers in real-time for a univariate time-series signal. The system may receive the univariate time-series signal, comprising a plurality of datasets, from a data source. The system may compute a standard deviation of a dataset of the plurality of datasets. Subsequently, the system may compute the optimal sample block size and the critical sample size of the dataset. Further, the system may determine the optimal operational block size of the dataset. The system may segment the plurality of datasets into blocks based upon the optimal operational block size. The system may detect the outliers by performing an outlier detection technique on the blocks, thereby ensuring improved execution time while minimally affecting precision and accuracy of the outcome of the outlier detection method.
Opening claim text (preview).
What is claimed is: 1. A method for detecting outliers in real-time for a univariate time-series signal, the method comprising: receiving, by a processor 210 , a univariate time-series signal from a data source having stored data captured by sensors, wherein the univariate time-series signal comprises a plurality of datasets, and wherein each dataset of the plurality of datasets comprises number of univariate time-series data elements; computing, by the processor 210 , a standard deviation (σ) of a dataset of the plurality of datasets; computing, by the processor 210 , an optimal sample block size ( ) of the dataset by using the standard deviation (σ); computing, by the processor 210 , a critical sample size ( critical ) of the dataset based on the standard deviation (σ), the N number of univariate time-series data elements, an accuracy (δ) in minimizing false alarms, and a precision (1−∈) of outcome related to the critical sample size critical ; determining, by the processor 210 , an optimal operational block size ( operational ) of the dataset using operational ={(| | mod critical =0}, wherein indicates number of univariate time-series data elements and critical indicates the critical sample size; segmenting, by the processor 210 , the plurality of datasets into blocks based upon the optimal operational block size ( operational ), wherein each block comprises / operational data elements of the number of univariate time-series data elements; and detecting, by the processor 210 , outliers in real-time by performing an outlier detection technique on the blocks of the segmented plurality of datasets, wherein the outlier detection technique comprises one or more unsupervised techniques including at least one of a Rosner filtering technique to minimize swamping effects or a Hampel filtering technique to minimize masking effects. 2. The method of claim 1 , wherein the optimal sample block size = σ ( 1 - δ ) ϵ 2 . wherein σ indicates the standard deviation, ∈ indicates precision loss and δ indicates the accuracy. 3. The method of claim 1 , wherein the critical sample size ( critical ) is computed using critical = min { σ ( 1 - δ ) ϵ 2 , ℕ / 2 } . wherein σ indicates the standard deviation, ∈ indicates precision loss, δ indicates the accuracy and indicates number of univariate time-series data elements. 4. A system implemented on a cloud-based environment for detecting outliers in real-time for a univariate time-series signal, the system comprises: a processor 210 ; a memory 212 coupled to the processor 210 , wherein the processor is capable for executing programmed instructions stored in the memory 212 to: receive a univariate time-series signal from a data source having stored data captured by sensors, wherein the univariate time-series signal comprises a plurality of datasets, and wherein each dataset of the plurality of datasets comprises number of univariate time-series data elements; compute a standard deviation (σ) of a dataset of the plurality of datasets; compute an optimal sample block size ( ) of the dataset by using the standard deviation (σ); compute a critical sample size ( critical ) of the dataset based on the standard deviation (σ), the number of univariate time-series data elements, an accuracy (δ) in minimizing false alarms, and a precision (1−∈) of outcome related to the critical sample size critical ; determine an optimal operational block size ( operational ) of the dataset using operational ={(| | mod critical =0}, wherein indicates number of univariate time-series data elements and critical indicates the critical sample size; segment the plurality of datasets into blocks based upon the optimal operational block size ( operational ), wherein each block comprises / operational data elements of the number of univariate time-series data elements; and detect outliers in real-time by performing an outlier detection technique on the blocks of the segmented plurality of datasets, wherein the outlier detection technique comprises one or more unsupervised techniques including at least one of a Rosner filtering technique to minimize swamping effects or a Hampel filtering technique to minimize masking effects. 5. The system of claim 4 , wherein the sample block size ( ) is computed using ( ) is computed using = σ ( 1 - δ ) ϵ 2 . wherein σ indicates the standard deviation, ∈ indicates precision loss and δ indicates the accuracy. 6. The system of claim 4 , wherein the critical sample size ( critical ) is computed using critical = min { σ
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Services specially adapted for wireless communication networks; Facilities therefor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.