External dataset capability compensation

US11163758B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11163758-B2
Application numberUS-201715665248-A
CountryUS
Kind codeB2
Filing dateJul 31, 2017
Priority dateSep 26, 2016
Publication dateNov 2, 2021
Grant dateNov 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for processing queries against an external data source utilizing dynamically allocated partitions operating on one or more worker nodes. The external data source can include data that has not been processed by the system. To query the external data source, a query coordinator can generate a subquery for the external data source based on determined functionality of the data source. The subquery can identify data in the external data source for processing and a manner for processing the data. In addition, the query coordinator can dynamically allocate partitions operating on worker nodes to retrieve and intake results of the subquery. In some cases, number of partitions allocated can be based on a number of partitions supported by the external data source.

First claim

Opening claim text (preview).

What is claimed: 1. A method, comprising: receiving, at a data intake and query system, a query identifying a set of data to be processed and a manner of processing the set of data, wherein the data intake and query system comprises a first plurality of processors; defining a query processing scheme for obtaining and processing the set of data, wherein defining the query processing scheme comprises: identifying, by the data intake and query system, an external data source that stores at least a portion of the set of data, wherein the external data source is distinct from the data intake and query system, determining, by the data intake and query system, data handling capabilities of the external data source, generating, by the data intake and query system, a subquery for the external data source based on the data handling capabilities of the external data source, the subquery identifying the at least a portion of the set of data stored in the external data source and a manner of processing the at least a portion of the set of data, determining, by the data intake and query system, prior to execution of the query, that the external data source is configured to support a number of concurrent data connections for parallel data transport of results of the subquery from the external data source to a second plurality of processors in communication with a component of the data intake and query system, and dynamically allocating, by the data intake and query system, a subset of the second plurality of processors for the subquery based at least in part on the number of concurrent data connections that the external data source is configured to support; generating instructions, by the data intake and query system, for the subset of the second plurality of processors to: provide the subquery to the external data source, receive the results of the subquery in parallel from the external data source based at least in part on the number of concurrent data connections that the external data source is configured to support, process the results of the subquery to generate processed results, and provide the processed results to the component of the data intake and query system; and executing the query based at least in part on the query processing scheme, wherein executing the query comprises communicating the instructions to the subset of the second plurality of processors. 2. The method of claim 1 , wherein defining the query processing scheme further comprises monitoring the external data source for activity and accessibility. 3. The method of claim 1 , wherein the number of concurrent data connections that the external data source is configured to support indicates that the external data source is configured to support a single parallel read, wherein the subquery comprises a plurality of subqueries, and wherein each processor of the second plurality of processors is configured to communicate a subquery of the plurality of subqueries to the external data source using the single parallel read. 4. The method of claim 1 , wherein the second plurality of processors comprise a processor for each concurrent data connection that the external data source is configured to support. 5. The method of claim 1 , wherein the data handling capabilities of the external data source comprises an identification of query commands supported by the external data source. 6. The method of claim 1 , wherein defining the data handling capabilities of the external data source comprises an identification of a location of processing nodes of the external data source. 7. The method of claim 1 , wherein generating the subquery comprises translating at least a portion of the query into commands understood by the external data source. 8. The method of claim 1 , wherein the subquery is generated based on an identification of commands supported by the external data source. 9. The method of claim 1 , wherein executing the query comprises monitoring the external data source. 10. The method of claim 1 , wherein executing the query comprises monitoring the external data source and allocating an additional processor to the subset of the second plurality of processors based on a determination that an additional concurrent data connection for parallel data transport is available on the external data source. 11. The method of claim 1 , wherein executing the query comprises monitoring the external data source and deallocating a processor of the subset of the second plurality of processors based on a determination that a concurrent data connection for parallel data transport of the external data source is not available. 12. The method of claim 1 , wherein executing the query further comprises monitoring the second plurality of processors. 13. The method of claim 1 , wherein defining the query processing scheme further comprises dynamically allocating the first plurality of processors to receive and process data from the subset of the second plurality of processors. 14. The method of claim 1 , wherein defining the query processing scheme further comprises allocating the first plurality of processors to receive data from the subset of the second plurality of processors based on the number of concurrent data connections that the external data source is configured to support. 15. The method of claim 1 , wherein defining the query processing scheme further comprises dynamically allocating the first a plurality of processors to receive data from the subset of the second plurality of processors and generating additional instructions for execution by the first plurality of processors. 16. The method of claim 1 , wherein defining the query processing scheme further comprises dynamically allocating the first plurality of processors to receive data from the subset of the second plurality of processors and generating additional instructions for execution by the first plurality of processors, and wherein executing the query further comprises communicating the additional instructions to the first plurality of processors. 17. The method of claim 1 , wherein defining the query processing scheme further comprises dynamically allocating the first plurality of processors to receive data from the subset of the second plurality of processors, and wherein executing the query further comprises monitoring the second plurality of processors and the first plurality of processors during execution of the query. 18. The method of claim 1 , wherein defining the query processing scheme further comprises dynamically allocating the first plurality of processors to receive and process data from the subset of the second plurality of processors and dynamically allocating a third plurality of processors to collect data from the first plurality of processors. 19. The method of claim 1 , wherein defining the query processing scheme comprises generating directed acyclic graph instructions for the second plurality of processors. 20. The method of claim 1 , wherein defining the query processing scheme comprises generating directed acyclic graph instructions for the second plurality of processors, and wherein executing the query further comprises communicating the directed acyclic graph instructions to the second plurality of processors. 21. A computing system, comprising: one or more processing devices configured to: receive a query identifying a set of data to be processed and a manner of processing the set of data; define a query processing scheme for obtaining and processing the set of data,

Assignees

Inventors

Classifications

  • Tablespace storage structures; Management thereof · CPC title

  • Iterative querying; Query formulation based on the results of a preceding query · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11163758B2 cover?
Systems and methods are disclosed for processing queries against an external data source utilizing dynamically allocated partitions operating on one or more worker nodes. The external data source can include data that has not been processed by the system. To query the external data source, a query coordinator can generate a subquery for the external data source based on determined functionality…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/2425. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).