Optimized query execution in a distributed data stream processing environment
US-2016004751-A1 · Jan 7, 2016 · US
US9697262B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9697262-B2 |
| Application number | US-201314109643-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 17, 2013 |
| Priority date | Dec 17, 2013 |
| Publication date | Jul 4, 2017 |
| Grant date | Jul 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some examples include high-performance query processing of real-time and offline temporal-relational data. Further, some implementations include processing streaming data events by annotating individual events with a first timestamp (e.g., a “sync-time”) and second timestamp that may identify additional event information. The stream of incoming data events may be organized into a sequence of data batches that each include multiple data events. The individual data batches in the sequence may be processed in a non-decreasing “sync-time” order.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: under control of one or more computing devices: receiving a real-time or near real-time query for a stream of incoming data events; annotating each individual data event of the incoming data events with a first timestamp and a second timestamp, wherein: the first timestamp identifies when the individual data event is received; and the second timestamp identifies additional information associated with the individual data event; organizing the stream of the incoming data events into a sequence of data batches based at least in part on the first timestamp of each individual data event of the incoming data events, wherein individual data batches, of the sequence of data batches, include multiple data events; and processing the individual data batches of the sequence of data batches in a non-decreasing time order, wherein each individual data batch stores: a key array that includes an array of grouping key values, the grouping key values representing a logic group of a data event, and control parameters that include: a synctime array that includes synctime values of at least some events in an individual data batch, and an othertime array that includes othertime values that indicate known future times that individual ones of the events in the individual data batch are expected to end; based on at least one of the key array, the synctime array, and the othertime array, presenting a result comprising one or more values from the stream of incoming data events in response to the received real-time or near real-time query. 2. The method of claim 1 , wherein the particular data batch of the individual data batches stores a payload array of all payloads within the particular data batch. 3. The method of claim 2 , wherein the payloads are arranged in a columnar format. 4. The method of claim 1 , wherein the control parameters are stored in a columnar format. 5. The method of claim 4 , wherein the control parameters stored in the columnar format further include: a bitvector that includes an occupancy vector representing an array with one bit; and a hash array that includes an array of hash values. 6. The method of claim 1 , wherein each incoming data event is not annotated with a third timestamp. 7. The method of claim 1 , wherein for a particular incoming data event: the particular incoming data event includes an interval event; the first timestamp includes a start time of the interval event; and the additional information identified by the second timestamp includes a known future time that the interval event is to end. 8. The method of claim 1 , wherein particular incoming data events include a start-edge event and an end-edge event. 9. A computing system comprising: one or more processors; one or more computer readable media maintaining instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising: receiving a real-time or near real-time query for a stream of incoming data events; annotating each individual data event of the incoming data events with a first timestamp and a second timestamp, wherein: the first timestamp identifies when the individual data event is received; and the second timestamp identifies additional information associated with the individual data event; organizing the stream of the incoming data events into a sequence of data batches based at least in part on the first timestamp of each individual data event of the incoming data events, wherein individual data batches store a key array that includes an array of grouping key values, the grouping key values representing a logic group of a data event; processing the individual data batches in the sequence of data batches in a non-decreasing time order, wherein each individual data batch stores: a key array that includes an array of grouping key values, the grouping key values representing a logic group of a data event, and control parameters that include: a synctime array that includes synctime values of at least some events in an individual data batch, and an othertime array that includes othertime values that indicate known future times that individual ones of the events in the individual data batch are expected to end; and based on at least one of the key array, the synctime array, or the othertime array, presenting a result comprising one or more values from the stream of incoming data events in response to the received real-time or near real-time query. 10. The computing system of claim 9 , the acts further comprising adding a punctuation to a first event to prompt an output of a partially filled data batch to a downstream operator, wherein the punctuation is further used to indicate when no incoming data is detected over a period of time. 11. The computing system of claim 9 , wherein: the individual data batches store control parameters in individual arrays; particular ones of the individual data batches store a corresponding array of all associated payloads; and processing the individual data batches in the sequence includes processing entire uninterrupted arrays without performing per-row encoding or per-row decoding. 12. The computing system of claim 9 , wherein: a particular data batch stores control parameters in individual arrays; the particular data batch stores an array of all payloads within the particular data batch; when a particular payload is of a string type, individual strings are stored within the particular data batch end-to-end in a single character array with additional information associated with starts and offsets of the individual strings; and processing the particular data batch includes: performing string operations directly on the single character array; or performing string operations includes copying individual strings to a buffer and performing string operations on the buffer. 13. The computing system of claim 9 , wherein: a particular stream is logically grouped by a key that logically represents multiple distinct sub-streams; and each distinct sub-stream is associated with a distinct value of a grouping key. 14. The computing system of claim 13 , wherein a single timestamp domain is associated with all groups such that passage of time occurs across all groups and not on a per-group basis. 15. The computing system of claim 13 , wherein a grouping key and a hash value associated with the grouping key are stored as part of a data event. 16. The computing system of claim 9 , the acts further comprising: receiving a query that corresponds to a row-oriented view of data; in response to receiving the query, dynamically generating custom code corresponding to columnar representation of the data; incorporating the custom code to generate a custom operator; and executing the custom operator against the columnar representation of the data to determine query results to be provided in response to the query. 17. The computing system of claim 16 , wherein dynamically generating the custom code includes replacing references to a particular field having a particular value with references to a particular row in a column corresponding to the particular field. 18. One or more computer-readable media maintaining instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: receiving a real-time or near real-time query for a stream of incoming data events;' annotating each individual data event of the stream of the incoming data events with a fi
Temporal data queries · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.