Fast query processing in columnar databases with gpus
US-2016378751-A1 · Dec 29, 2016 · US
US11138170B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11138170-B2 |
| Application number | US-201715404152-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2017 |
| Priority date | Jan 11, 2016 |
| Publication date | Oct 5, 2021 |
| Grant date | Oct 5, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The current document is directed to a query-as-a-service system (“QAAS system”) that collects enormous volumes of data from network-connected entities, referred to as “Things” in the phrase “Internet of Things,” persistently stores the collected data and provides a distributed-query-execution engine that allows remote clients to continuously execute queries against the collected data. In a described implementation, both the raw data and query results are persistently stored in the QAAS system, with the raw data stored for significantly longer periods of time. Query results generated by the query-processing engine are securely transmitted to QAAS remote clients for distribution to file systems, storage appliances, applications, and other data sinks within client systems.
Opening claim text (preview).
The invention claimed is: 1. A query-as-a-service system comprising: a distributed data-streaming service that comprises one or more first computer systems, wherein the one or more first computer systems include: one or more first processors; and one or more first memories that include one or more instructions that when executed by the one or more first processors, cause the one or more first processors to perform operations including: receiving, from each of a plurality of internet-of-things (IoT) devices, one or more communications through one or more networks, each communication of the one or more communications including unstructured data associated with a function of the IoT device, the unstructured data including key/value pairs and data of a plurality of data types; aggregating the one or more communications; processing the aggregated communications to generate a set of processed data for use in generating one or more data streams; and receiving a first query from a first remote client computer, the first query including an identification of a first stream type from a plurality of stream types and specifying a first geographical filter that identifies a geographical location; generating a first IoT data stream using the set of processed data, wherein the first IoT data stream includes: the first stream type, the first stream type indicating a first subset of data types of the plurality of data types included in the first IoT data stream; and a first plurality of data messages, wherein one or more data messages of the first plurality of data messages include one or more first key/value pairs that are common to the first plurality of data messages and one or more second key/value pairs that are unique to the first stream type, wherein the first plurality of data messages correspond to a first portion of the unstructured data received from a first IoT device of the plurality of IoT devices, and wherein the first geographical filter filters out data messages of the first plurality of data messages other than data messages that contain a data value that corresponds to the geographical location; receiving a second query from a second remote client computer, the second query including an identification of a second stream type from the plurality of stream types; generating a second IoT data stream using the set of processed data, wherein the second IoT data stream is generated in parallel with generation of the first IoT data stream, and wherein the second IoT data stream includes: the second stream type, the second stream type indicating a second subset of data types of the plurality of data types included in the second IoT data stream, the second subset of data types including at least some data types that are different from data types included in the first subset of data types; and a second plurality of data messages, wherein one or more data messages of the second plurality of data messages include one or more third key/value pairs that are common to the second plurality of data messages and one or more fourth key/value pairs that are unique to the second stream type, wherein the second plurality of data messages that correspond to a second portion of the unstructured data received from the first IoT device of the plurality of IoT devices; transferring, in response to receiving the first query and to the first remote client computer, a first query result that includes the first IoT data stream; and transferring, in response to receiving the second query and to the second remote client computer, a second query result that includes the second IoT data stream. 2. The query-as-a-service system of claim 1 , wherein the IoT devices include one or more of: network-connected processor-controlled computers; network-connected processor-controlled devices; network-connected processor-controlled appliances; and network-connected devices controlled by logic circuitry. 3. The query-as-a-service system of claim 1 , wherein: each of the one or more communications comprises event messages that each includes data values associated with one or more data fields; the event messages are enriched, by the distributed data-streaming service, to include additional data values corresponding to additional fields; and the enriched event messages are assembled into session messages by the distributed data-streaming service, each session message including data values corresponding to one or more event messages that are each associated with a particular session identifier. 4. The query-as-a-service system of claim 3 , wherein the first IoT data stream includes session messages associated with a session identifier included in a list of session identifiers associated with the first IoT data stream, the session identifiers in each list of session identifiers being different from the session identifiers in other lists of session identifiers associated with the second IoT data stream. 5. The query-as-a-service system of claim 4 , wherein each IoT data stream is partitioned based on time into one or more time partitions, with the data streamed during a particular time partition of the one or more time partitions being stored in a mass-storage device that is associated with the particular time partition. 6. The query-as-a-service system of claim 5 , wherein the data stored in a mass-storage device is stored as separated compressed columns, each column containing the data values for a particular data field of a particular session message of the session messages. 7. The query-as-a-service system of claim 6 , wherein the query-as-a-service system does not create and maintain indexes for the data stored in compressed columns. 8. The query-as-a-service system of claim 1 , further comprising one or more second computer systems that include a driver computer system and multiple worker computer systems, the driver computer system executing instructions that cause the driver computer system to perform operations including: parsing the first query from the remote client computer; determining a plurality of data sources to be processed by worker systems in order to execute the first query, each data source of the plurality of data sources comprising one or more of a mass-storage device that stores data streamed from a first data substream during a particular time partition and the first IoT data stream; allocating a number of worker computer systems based on, at least in part, the determined plurality of data sources; configuring each worker computer system of the number of worker computer systems to execute the first query; and continuously: assigns one or more data sources to each worker computer system without a current data-source assignment; and receives query-execution results generated by worker computer systems until the plurality of data sources have been processed by the number of worker computer systems. 9. The query-as-a-service system of claim 8 , wherein the driver computer system configures the worker computer systems by transmitting, to each worker computer system of the number of worker computer systems, query execution information that includes indications of one or more columns relevant to query execution and a query plan. 10. The query-as-a-service system of claim 9 , wherein a worker computer system processes a data source by: determining one or more columns of the one or more columns to access; uncompressing data included in the determined one or more columns; executing the query plan using the uncompressed data to generate local query results; and returning the local query results to the driver computer system. 11. The query-as-a-service system of claim 9 , wherein the dr
for unicast · CPC title
Reuse of stored results of previous queries · CPC title
for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title
wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption (cryptographic mechanisms or cryptographic arrangements for public-key encryption H04L9/30) · CPC title
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.