Source detection and indexing for managed search
US-10452675-B1 · Oct 22, 2019 · US
US10909131B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10909131-B1 |
| Application number | US-201715581675-A |
| Country | US |
| Kind code | B1 |
| Filing date | Apr 28, 2017 |
| Priority date | Apr 28, 2017 |
| Publication date | Feb 2, 2021 |
| Grant date | Feb 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are disclosed for efficiently indexing stream data to facilitate full-text search of the stream data. A stream comprises large amount of data, only some of which is deemed useful for full-text search indexing. An administrator can specify an indexing specification for a stream. The indexing specification can specify one or more sub-streams within the stream for indexing, and/or specify one or more time intervals of stream data for indexing. A query against the stream can specify the indexing specification to use to index the stream before returning results for the query. The query can alternatively specify an indexing specification to apply to a previously indexed stream. Full-text search indexes generated using an indexing specification can return results that are more relevant to a user because the results are more narrowly focused than an index of, e.g., the entire stream.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method performed by a server for efficiently indexing a portion of a stream data, comprising: accessing a stream database that has been declared searchable; accessing a database of indexing specifications to determine whether an indexing specification exists for the stream data; in response to determining that the indexing specification exists for the stream data: receiving one or more stream index specifications associated with a stream, each of the one or more stream index specifications comprising one or more rules for indexing the stream, wherein a stream index specification specifies a portion of stream data of the stream to index and identifies a list of terms that should be indexed and a list of terms that should not be indexed; and generating one or more stream indexes for the portion of stream data in accordance with the one or more stream index specifications; otherwise, in response to determining that the indexing specification does not exist for the stream data, generating one or more stream indexes for the portion of the stream data in accordance with a default indexing scheme; wherein the portion of stream data comprises a temporal subset of the stream data defined by an interval of time; whereby only a portion of the temporal subset is indexed repeatedly for a plurality of intervals of time to generate a plurality of interval indexes to thereby reduce an amount of computing necessary to index the incoming streaming data; and, when a number of intervals of time exceeds a threshold, merging the plurality of interval indexes and marking a previously written index for deletion to thereby reduce an amount of storage required for storing resulting indexing. 2. The method of claim 1 , further comprising: receiving a change to one or more of the stream index specifications; and generating one or more stream indexes for a second portion of the stream data in accordance with the one or more changed stream index specifications. 3. The method of claim 1 , wherein: the stream comprises a plurality of sub-streams; and the specified portion of the stream data comprises one or more sub-streams of the stream data. 4. The method of claim 3 , wherein: the stream comprises stream data over a period of time; the specified portion of the stream data comprises a sub-interval of the period of time; and the portion of the stream data indexed includes only stream data which falls within the sub-interval and the subset of the plurality of streams. 5. The method of claim 1 , wherein: the stream comprises stream data over a period of time; the specified portion of the stream data comprises a sub-interval of the period of time; and the method further comprises generating a full-text search index in response to a query against a stream index in the one or more stream indexes for the portion of the stream of data. 6. A non-transitory computer readable medium, programmed with executable instructions that, when executed by a processing system, perform operations comprising: accessing a stream database to determine whether the stream data has been declared searchable; in response to determining that the stream data has not been declared searchable, storing the stream data unindexed, and in response to determining that the stream data has been declared searchable: accessing a database of indexing specifications to determine whether an indexing specification exists for the stream data; in response to determining that the indexing specification exists for the stream data: receiving one or more stream index specifications associated with a stream, each of the one or more stream index specifications comprising one or more rules for indexing the stream, wherein a stream index specification specifies a portion of stream data of the stream to index and identifies a list of terms that should be indexed and a list of terms that should not be indexed; and generating one or more stream indexes for the portion of stream data in accordance with the one or more stream index specifications; otherwise, in response to determining that the indexing specification does not exist for the stream data, generating one or more stream indexes for the portion of the stream data in accordance with a default indexing scheme; wherein the portion of stream data comprises a temporal subset of the stream data defined by an interval of time; whereby only a portion of the temporal subset is indexed repeatedly for a plurality of intervals of time to generate a plurality of interval indexes to thereby reduce an amount of computing necessary to index the incoming streaming data; and, when a number of intervals of time exceeds a threshold, merging the plurality of interval indexes and marking a previously written index for deletion to thereby reduce an amount of storage required for storing resulting indexing. 7. The medium of claim 6 , further comprising: receiving a change to one or more of the stream index specifications; and generating one or more stream indexes for a second portion of the stream data in accordance with the one or more changed stream index specifications. 8. The medium of claim 6 , wherein: the stream comprises a plurality of sub-streams; and the specified portion of the stream data comprises one or more sub-streams of the stream data. 9. The medium of claim 8 , wherein: the stream comprises stream data over a period of time; the specified portion of the stream data comprises a sub-interval of the period of time; and the portion of the stream data indexed includes only stream data which falls within the sub-interval and the subset of the plurality of streams. 10. The medium of claim 6 , wherein: the stream comprises stream data over a period of time; the specified portion of the stream data comprises a sub-interval of the period of time; and the method further comprises generating a full-text search index in response to a query against a stream index in the one or more stream indexes for the portion of the stream of data. 11. A processing system, comprising a hardware processor coupled to a memory programmed with executable instructions, that when executed by the processing system, perform operations comprising: accessing a stream database to determine whether the stream data has been declared searchable; in response to determining that the stream data has not been declared searchable, storing the stream data unindexed, and in response to determining that the stream data has been declared searchable: accessing a database of indexing specifications to determine whether an indexing specification exists for the stream data; in response to determining that the indexing specification exists for the stream data: receiving one or more stream index specifications associated with a stream, each of the one or more stream index specifications comprising one or more rules for indexing the stream, wherein a stream index specification specifies a portion of stream data of the stream to index and identifies a list of terms that should be indexed and a list of terms that should not be indexed; and generating one or more stream indexes for the portion of stream data in accordance with the one or more stream index specifications; otherwise, in response to determining that the indexing specification does not exist for the stream data, generating one or more stream indexes for the portion of the stream data in accordance with a default indexing scheme; wherein the portion of stream data comprises a temporal subset of the stream data defined by an interval of time; whereby only a portion of the temporal subset is indexed repeatedly for
Indexing; Data structures therefor; Storage structures (for retrieval from the web G06F16/951) · CPC title
Indexing structures · CPC title
Data stream processing; Continuous queries · CPC title
Presentation of query results · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.