Multi-site cluster-based data intake and query systems

US11436268B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11436268-B2
Application numberUS-201815967478-A
CountryUS
Kind codeB2
Filing dateApr 30, 2018
Priority dateSep 30, 2014
Publication dateSep 6, 2022
Grant dateSep 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The various embodiments describe multi-site cluster-based data intake and query systems, including cloud-based data intake and query systems. Using a hybrid search system that includes cloud-based data intake and query systems working in concert with so-called “on-premises” data intake and query systems can promote the scalability of search functionality. In addition, the hybrid search system can enable data isolation in a manner in which sensitive data is maintained “on premises” and information or data that is not sensitive can be moved to the cloud-based system. Further, the cloud-based system can enable efficient leveraging of data that may already exist in the cloud. In addition, various embodiments enable configuration data associated with search functionality to be shared amongst clusters in a manner that promotes cluster security. Specifically, a shared data store can be utilized to store configuration information such that when a particular cluster wishes to use the configuration information, it simply retrieves the configuration information from the shared data store, thus avoiding direct communication with other clusters. Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claimed invention.

First claim

Opening claim text (preview).

Having thus described the invention, what is claimed is: 1. A computer-implemented method, comprising: receiving, at a first cluster, a request for information identifying a plurality of indexers of the first cluster, the first cluster being a first data intake and query system, wherein the request was transmitted from a second cluster that is a second data intake and query system; determining, at the first cluster, the information identifying the plurality of indexers, wherein the information identifying the plurality of indexers identifies the plurality of indexers based on at least one master node of the first cluster identifying active indexers within the first cluster and comprises a list of the active indexers; in response to the request, transmitting, from the first cluster to the second cluster, the information identifying the plurality of indexers, and information on how to communicate with the plurality of indexers that is used by the second cluster in distributing a distributed search query; receiving, at the plurality of indexers of the first cluster, the distributed search query encoded in a transmission from the second cluster, wherein the transmission of the distributed search query is distributed across the plurality of indexers based on the information identifying the plurality of indexers; determining, at the first cluster, a response to the distributed search query from at least one of the plurality of indexers, wherein each response from a respective indexer is generated by the respective indexer based on an evaluation, by the respective indexer, of the distributed search query; and providing, from the first cluster to the second cluster, the response to the distributed search query. 2. The method as described in claim 1 , wherein the receiving of the request for information identifying the plurality of indexers of the first cluster is through a firewall of the first cluster and the request is transmitted from the second cluster to the first cluster based at least on the second cluster receiving the distributed search query. 3. The method as described in claim 1 , wherein the transmitting of the information identifying the plurality of indexers is through a firewall of the second cluster. 4. The method as described in claim 1 , wherein the first cluster is a cloud-based cluster and the second cluster is an on-premises cluster. 5. The method as described in claim 1 , wherein the evaluation is on events associated with timestamps, the events comprising raw portions of machine data. 6. The method as described in claim 1 , wherein the distributed search query is configured to be used with a late-binding schema. 7. The method as described in claim 1 , wherein the evaluation is performed on log data. 8. The method as described in claim 1 , wherein the first cluster includes a single master node that includes information about active indexers within the first cluster, and the information identifies the plurality of indexers based on the single master node of the first cluster identifying the active indexers. 9. The method as described in claim 1 , wherein the first cluster and the second cluster each include a single master node that includes information about active indexers within its respective cluster, and the information identifies the plurality of indexers based on the single master node of the first cluster identifying the active indexers. 10. The method as described in claim 1 , wherein the receiving of the request for the information identifying the plurality of indexers of the first cluster is through a firewall of the first cluster, the firewall configured to allow inbound communication based on an IP address of a search head of the second cluster that requests the information. 11. The method as described in claim 1 , wherein the receiving of the request for the information identifying the plurality of indexers of the first cluster is through a firewall of the first cluster, the firewall configured to allow inbound communication based on an IP address of the second cluster that requests the information. 12. The method as described in claim 1 , wherein the list of active indexers includes respective IP addresses of the plurality of indexers, and the receiving of the distributed search query is based on the respective IP addresses from the list of active indexers. 13. The method as described in claim 1 , wherein the information includes a generation identifier to be used in distributing the distributed search query, the generation identifier identifying an indexer as a primary indexer to perform the evaluation on data and return corresponding search results when multiple indexers respectively manage a corresponding copy of the data. 14. The method as described in claim 1 , wherein the information includes a list of the active indexers and a generation identifier to be used in distributing the distributed search query, the generation identifier identifying primary and secondary indexers of the first cluster. 15. The method as described in claim 1 , wherein the response includes event results associated with the distributed search query. 16. A non-transitory computer readable storage media, storing software instructions, which when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving, at a first cluster, a request for information identifying a plurality of indexers of the first cluster, the first cluster being a first data intake and query system, wherein the request was transmitted from a second cluster that is a second data intake and query system; determining, at the first cluster, the information identifying the plurality of indexers, wherein the information identifying the plurality of indexers identifies the plurality of indexers based on at least one master node of the first cluster identifying active indexers within the first cluster and comprises a list of the active indexers; in response to the request, transmitting, from the first cluster to the second cluster, the information identifying the plurality of indexers, and information on how to communicate with the plurality of indexers that is used by the second cluster in distributing a distributed search query; receiving, at the plurality of indexers of the first cluster, the distributed search query encoded in a transmission from the second cluster, wherein the transmission of the distributed search query is distributed across the plurality of indexers based on the information identifying the plurality of indexers; determining, at the first cluster, a response to the distributed search query from at least one of the plurality of indexers, wherein each response from a respective indexer is generated by the respective indexer based on an evaluation, by the respective indexer, of the distributed search query; and providing, from the first cluster to the second cluster, the response to the distributed search query. 17. The non-transitory computer readable storage media of claim 16 , wherein the receiving of the request for information identifying the plurality of indexers of the first cluster is through a firewall of the first cluster. 18. The non-transitory computer readable storage media of claim 16 , wherein the transmitting of the information identifying the plurality of indexers is through a firewall of the second cluster. 19. The non-transitory computer readable storage media of claim 16 , wherein the first cluster is a cloud-based cluster and the second cluster is an on-premises cluster.

Assignees

Inventors

Classifications

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Management therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11436268B2 cover?
The various embodiments describe multi-site cluster-based data intake and query systems, including cloud-based data intake and query systems. Using a hybrid search system that includes cloud-based data intake and query systems working in concert with so-called “on-premises” data intake and query systems can promote the scalability of search functionality. In addition, the hybrid search system c…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).