Data stream processing

US9613123B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9613123-B2
Application numberUS-42287509-A
CountryUS
Kind codeB2
Filing dateApr 13, 2009
Priority dateApr 13, 2009
Publication dateApr 4, 2017
Grant dateApr 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of processing a stream of raw data from a plurality of distributed data producing devices includes reducing the raw data to a plurality of representative synopsis coefficients, organizing the synopsis coefficients into a data structure with at least three dimensions, including a time window dimension and an accuracy dimension. Responsive to a detected anomaly in the data structure, at least one of a predetermined autonomous action and an action directed by a user is performed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of processing a stream of raw data from a plurality of distributed data producing devices, said method comprising: reducing said raw data to a plurality of representative synopsis coefficients, each synopsis coefficient being a numeric value representative of two or more different values from said raw data; organizing said synopsis coefficients into a data structure comprising at least three dimensions, wherein said dimensions comprise a location dimension, a time window dimension and an accuracy dimension, and wherein the location dimension organizes the synopsis coefficients topologically, the time window dimension organizes the synopsis coefficients according to a time when the raw data was received, and the accuracy dimension determines how many of the synopsis coefficients are examined to identify two or more most significant anomalies; identifying the two or more most significant anomalies in a computational cube associated with the data structure; and responsive to the identification of the two or more most significant anomalies in the computational cube, performing a remedial action to remedy one or more problems associated with the two or more most significant anomalies. 2. The method of claim 1 , further comprising providing an interactive visual analytics display to a user of at least one of: said stream of raw data, said synopsis coefficients, said data structure, said two or more most significant anomalies, and said predetermined autonomous action. 3. The method of claim 2 , further comprising providing said interactive visual analytics display to said user in accordance with at least one level of abstraction determined by said user. 4. The method of claim 2 , wherein said interactive visual analytics display comprises a correlation map between at least two selected measurements in said raw data. 5. The method of claim 1 , wherein said data producing devices comprise sensors. 6. The method of claim 5 , wherein said sensors are distributed throughout a data center. 7. The method of claim 1 , wherein said synopsis coefficients are calculated by: forming a binary tree out of said raw data in said stream, said binary tree comprising hierarchical nodes and leaves, wherein said hierarchical nodes correspond to said synopsis coefficients and said leaves correspond to elements of said raw data; and computing a weighted difference between the average values of the child leaves of each parent node in said binary tree, wherein said computed weighted differences are said synopsis coefficients. 8. The method of claim 1 , wherein said accuracy dimension of said data structure is dependent on a level of abstraction of said synopsis coefficients. 9. The method of claim 1 , wherein said predetermined autonomous action comprises at least one of a notification to a user and a feedback signal sent to a controller. 10. The method of claim 1 , wherein said accuracy dimension determines how many of said synopsis coefficients are used to detect for said anomaly. 11. A system, comprising: a plurality of distributed data producing devices; and at least one computing device communicatively coupled to said distributed data producing devices; wherein said computing device is configured to: reduce a stream of raw data received from said distributed data producing devices to a plurality of representative synopsis coefficients, wherein reducing a stream of raw data comprises aggregating the synopsis coefficients using one of a count function, an average function, and a standard deviation function; organize said synopsis coefficients into a data structure comprising at least three dimensions, wherein said dimensions comprise a location dimension, a time window dimension and an accuracy dimension, and wherein the location dimension organizes the synopsis coefficients topologically, the time window dimension organizes the synopsis coefficients according to a time when the raw data was received, and the accuracy dimension determines how many of the synopsis coefficients are examined to identify two or more most significant anomalies; and perform at least one of a predetermined autonomous response and a user directed action responsive to a detected anomaly in said data structure. 12. The system of claim 11 , wherein said computing device is further configured to provide an interactive visual analytics display to said user, wherein said interactive visual analytics display comprises at least one of: said stream of raw data, said synopsis coefficients, said data structure, said detected anomaly, and said predefined rule. 13. The system of claim 11 , wherein said computing device is further configured to calculate said synopsis coefficients by: forming a binary tree out of said raw data in said stream, said binary tree comprising hierarchical nodes and leaves, wherein said hierarchical nodes correspond to said synopsis coefficients and said leaves correspond to elements of said raw data; and computing a weighted difference between the average values of the child leaves of each parent node in said binary tree, wherein said computed weighted differences are said synopsis coefficients. 14. The system of claim 11 , wherein said data producing devices comprise sensors. 15. The system of claim 14 , wherein said sensors are distributed throughout a data center. 16. The system of claim 11 , wherein said predetermined autonomous action comprises at least one of a notification to a user and a feedback signal to a controller. 17. The system of claim 11 , wherein said accuracy dimension determines how many of said synopsis coefficients are used to detect for said anomaly. 18. The system of claim 11 , wherein said computing device detects said anomaly in said data structure using at least one of sequence patterns, semantic correlations or threshold violations. 19. The system of claim 18 , wherein said semantic correlation compares two different event types from different types of sensors among said data producing devices. 20. The system of claim 11 , wherein said at least one computing device comprises a data stream management system wherein continuous queries are authored using a Continuous Query Language and run against said synopsis coefficients in said data structure to detect for anomalies. 21. A computer program product for processing a stream of raw data from a plurality of distributed data producing devices, said computer program product comprising: a non-transitory computer usable medium having computer usable program code embodied therewith, said computer usable program code comprising: computer usable program code configured to cause a computer to reduce said raw data to a plurality of representative synopsis coefficients using one of a count function, an average function, and a standard deviation function; computer usable program code configured to cause said computer to organize said synopsis coefficients into a data structure comprising at least three dimensions, wherein said dimensions comprise a location dimension, a time window dimension and an accuracy dimension, and wherein the location dimension organizes the synopsis coefficients topologically, the time window dimension organizes the synopsis coefficients according to a time when the raw data was received, and the accuracy dimension determines how many of the synopsis coefficients are examined to identify two or more most significant anomalies; and computer usable program code configured to, responsive to a detected anomaly

Assignees

Inventors

Classifications

  • G06F16/283Primary

    Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

  • Browsing; Visualisation therefor (for navigating the web G06F16/954; browsing optimisation for the web G06F16/957) · CPC title

  • Clustering; Classification · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9613123B2 cover?
A method of processing a stream of raw data from a plurality of distributed data producing devices includes reducing the raw data to a plurality of representative synopsis coefficients, organizing the synopsis coefficients into a data structure with at least three dimensions, including a time window dimension and an accuracy dimension. Responsive to a detected anomaly in the data structure, at …
Who is the assignee on this patent?
Gupta Chetan Kumar, Wang Song, Ari Ismail, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F16/283. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).