Multi-partitioning determination for combination operations

US10896182B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10896182-B2
Application numberUS-201715714029-A
CountryUS
Kind codeB2
Filing dateSep 25, 2017
Priority dateSep 25, 2017
Publication dateJan 19, 2021
Grant dateJan 19, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to avoid a significantly imbalanced partition.

First claim

Opening claim text (preview).

What is claimed: 1. A method comprising: determining, based on syntax of a search query associated with a first dataset and a second dataset, that the search query is susceptible to generating a first partition that includes more data entries than a second partition; based on the determining, monitoring execution of the search query; and performing, based on the monitoring, a multi-partition operation on a first group of data entries of the first dataset and a second group of data entries of the second dataset. 2. The method of claim 1 , wherein the first dataset comprises a first plurality of data entries and the second dataset comprises a second plurality of data entries, the method further comprising: assigning the first plurality of data entries to a plurality of partitions based on first field-value pairs associated with the first plurality of data entries; and assigning the second plurality of data entries to the plurality of partitions based on second field-value pairs associated with the second plurality of data entries, wherein data entries assigned to a particular partition of the plurality of partitions are associated with a same field-value pair, wherein the first group of data entries and the second group of data entries are assigned to the particular partition, wherein the monitoring comprises determining that the data entries assigned to the particular partition satisfy a data entries quantity threshold, and wherein the performing the multi-partition operation is based on the determining. 3. The method of claim 1 , wherein the first group of data entries is a first group of events, the second group of data entries is a second group of events, the first dataset comprises a first plurality of events and the second dataset comprises a second plurality of events, the method further comprising: assigning the first plurality of events to a plurality of partitions based on first field-value pairs associated with the first plurality of events; and assigning the second plurality of events to the plurality of partitions based on second field-value pairs associated with the second plurality of events, wherein events assigned to a particular partition of the plurality of partitions are associated with a same field-value pair, wherein the first group of events and the second group of events are assigned to the particular partition, wherein the monitoring comprises determining that the events in the particular partition satisfy a data entries quantity threshold, and wherein the performing the multi-partition operation is based on the determining. 4. The method of claim 1 , wherein the first dataset corresponds to a first dataset source and the second dataset corresponds to a second dataset source. 5. The method of claim 1 , wherein the first dataset and the second dataset correspond to a same dataset source. 6. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises parsing the search query upon receipt of the search query. 7. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises identifying expansion operations to be executed as part of the search query. 8. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises determining that an expansion operation of the search query is to be performed prior to a combination operation of the search query. 9. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises determining that an expansion operation of the search query is to be performed prior to a combination operation of the search query, and wherein an output of the expansion operation corresponds to an input of the combination operation. 10. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises determining that no reduction operation is to be performed prior to a combination operation of the search query. 11. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises determining that a field to be used in a combination operation of the search query is different from a field to be used in a reduction operation of the search query prior to the combination operation. 12. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises: identifying a field to be used in a combination operation; analyzing an inverted index that comprises a plurality of field-value pairs associated with the field; and identifying at least one field-value pair of the plurality of field-value pairs that satisfies a data entries quantity threshold. 13. The method of claim 1 , wherein determining that the search query is susceptible to generating the first partition that includes more data entries than the second partition comprises: identifying a field to be used in a combination operation that involves the first group of data entries and the second group of data entries; identifying a quantity of a first field-value pair of a first plurality of field-value pairs in a first inverted index associated with the first dataset, wherein the first field-value pair is associated with the field; identifying a quantity of a second field-value pair of a second plurality of field-value pairs in a second inverted index associated with the second dataset, wherein the second field-value pair is associated with the field; and determining that a combination of the quantity of the first field-value pair and the quantity of the second field-value pair satisfies a data entries quantity threshold. 14. The method of claim 1 , wherein monitoring execution of the search query comprises generating instructions for one or more processing devices to monitor a quantity of the first group of data entries and the second group of data entries. 15. The method of claim 1 , wherein monitoring execution of the search query comprises monitoring a quantity of the first group of data entries and the second group of data entries. 16. The method of claim 1 , wherein each data entry of the first group of data entries includes a same field value pair and each data entry of the second group of data entries includes the same field-value pair. 17. The method of claim 1 , further comprising determining, based on the monitoring, that a quantity of the first group of data entries and the second group of data entries satisfies a data entries quantity threshold, wherein performing the multi-partition operation is based on the determining. 18. The method of claim 1 , further comprising determining, based on the monitoring that a combination of the first group of data entries and the second group of data entries satisfies a data entries quantity threshold, wherein performing the multi-partition operation is based on the determining. 19. The method of claim 1 , further comprising continuing execution of the search query following the multi-partition

Assignees

Inventors

Classifications

  • where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems (multiprogramming arrangements G06F9/46; allocation of resources G06F9/50) · CPC title

  • Timestamp · CPC title

  • for load management (allocation of a server based on load conditions G06F9/505; load rebalancing G06F9/5083; redistributing the load in a network by a load balancer H04L67/1029) · CPC title

  • Event-based monitoring · CPC title

  • Unary operations; Data partitioning operations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10896182B2 cover?
Systems and methods are disclosed for processing and executing queries against one or more dataset. As part of processing the query, the system determines whether the query is susceptible to a significantly imbalanced partition. In the event, the query is susceptible to an imbalanced partition, the system monitors the query and determines whether to perform a multi-partitioning determination to…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F11/3006. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 19 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).