Advanced field extractor with modification of an extracted field

US9594814B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9594814-B2
Application numberUS-201514611089-A
CountryUS
Kind codeB2
Filing dateJan 30, 2015
Priority dateSep 7, 2012
Publication dateMar 14, 2017
Grant dateMar 14, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not been processed by standard extraction or transformation methods. By using sample events, a focus on primary and secondary example events help formulate either a single extraction rule spanning multiple data formats, or multiple rules directed to distinct formats. Selection tools mark up the example events to indicate positive examples for the extraction rules, and to identify negative examples to avoid mistaken value selection. The extraction rules can be saved for query-time use, and can be incorporated into a data model for sets and subsets of event data.

First claim

Opening claim text (preview).

The invenyion claimed is: 1. A computer-implemented method comprising: accessing in memory a set of events, each event identified by an associated time stamp; wherein each event in the set of events includes a portion of raw data; receiving data indicating selection of a first event from among a first plurality of events and data indicating a selection of one or more portions of text within the raw data of the first event to be extracted as one or more fields; automatically determining an initial extraction rule that extracts the selected portions of text within the first event; causing display of a first interface providing tools that implement user modification of the extraction rule, including selecting a field from the one or more fields and: selecting one or more non-adjoining strings to concatenate with the selected field; selecting a portion of the selected field to be trimmed from the beginning or end of the selected field; or selecting sub-portions of text to extract from within the selected field. 2. The method of claim 1 , further including: receiving further data indicating selection of the one or more non-adjoining strings to concatenate into a concatenated field; and updating the field extraction rule to combine the non-adjoining strings into the concatenated field. 3. The method of claim 1 , further including: receiving further data indicating one or more trim commands to apply to the selected field; and updating the field extraction rule to include the trim commands. 4. The method of claim 1 , further including: receiving further data indicating selection of sub-portions of text to extract from within the selected field; automatically determining a secondary extraction rule to extract the sub-portions of text from within the selected field; and updating the field extraction rule to include the secondary extraction rule. 5. The method of claim 1 , further including: transmitting for display a second user interface providing tools that implement user selection of a sampling strategy to determine the events in a display; receiving further data indicating a selection of the sampling strategy; sampling the events to be displayed; and transmitting for display a third user interface including an annotated version of the plurality of events, wherein the annotated version indicates the portions of text within the plurality of events extracted by the initial extraction rule. 6. The method of claim 1 , further including: causing display of a second user interface providing tools that implement user selection of only events that match the field extraction rule the field extraction rule; receiving further data indicating a selection of only the events that match; and sampling according to the match selection; and causing display of a third user interface including the plurality of events according to the match selection, wherein the annotated version indicates the portions of text within the plurality of events extracted by the initial extraction rule. 7. The method of claim 1 , further including: receiving further data indicating a selection to validate the extraction rule; causing display of a second user interface including an annotated version of the plurality of events, wherein the annotated version indicates the portions of text within the plurality of events extracted by the field extraction rule and provides one or more user controls that implement user selection of indicated portions of the text as examples of text that should not be extracted; receiving further data indicating a selection of one or more examples of text that should not be extracted; and automatically determining an updated field extraction rule that does not extract the text that should not be extracted. 8. The method of claim 1 , further including: causing display of a second user interface providing tools that implements user selection among the fields; receiving further data indicating a selection of a selected field; and causing display of a frequency table of values of the selected field extracted from a sample of the events, wherein the frequency table includes a list of values extracted and for each value in the list a frequency and an active filter control, wherein the active filter control filters events to be displayed based on a selected value. 9. The method of claim 1 , further including: receiving further data indicating a selection to save the extraction rule and field names for later use in processing events; and incorporating the saved extraction rule and field names in a data model that includes a late binding schema of extraction rules applied at search time. 10. The method of claim 1 , further including: causing display of a second user interface providing one or more tools that implement user entry of a filter value to determine the events in the display; receiving further data indicating entry of a keyword value to apply as a filter; resampling according to the keyword value; and updating the events to be displayed. 11. A computer-implemented system comprising: a processor, memory coupled to the processor, and instructions stored in the memory that implement the actions of: accessing in memory a set of events, each event identified by an associated time stamp; wherein each event in the set of events includes a portion of raw data from machine data; receiving data indicating selection of a first event from among a first plurality of events and data indicating a selection of one or more portions of text within the raw data of the first event to be extracted as one or more fields; automatically determining an initial extraction rule that extracts the selected portions of text within the first event; causing display of a first interface providing tools that implement user modification of the extraction rule, including selecting a field from the one or more fields and: selecting one or more non-adjoining strings to concatenate with the selected field; selecting a portion of the selected field to be trimmed from the beginning or end of the selected field; or selecting sub-portions of text to extract from within the selected field. 12. The computer-implemented system of claim 11 , further including: receiving further data indicating selection of the one or more non-adjoining strings to concatenate into a concatenated field; and updating the field extraction rule to combine the non-adjoining strings into the concatenated field. 13. The computer-implemented system of claim 11 , further including: receiving further data indicating one or more trim commands to apply to the selected field; and updating the field extraction rule to include the trim commands. 14. The computer-implemented system of claim 11 , further including: receiving further data indicating selection of sub-portions of text to extract from within the selected field; automatically determining a secondary extraction rule to extract the sub-portions of text from within the selected field; and updating the field extraction rule to include the secondary extraction rule. 15. The computer-implemented system of claim 11 , further including: causing display of a second user interface providing tools that implement user selection of a sampling strategy to determine the events in a display; receiving further data indicating a selection of the sampling strategy; sampling the events to be displayed; and causing display of a third user interface including an annotated version of the plurality of events, wherein the annotated version indicates the portions of text within the plural

Assignees

Inventors

Classifications

  • of tables; using ruled lines · CPC title

  • G06F40/166Primary

    Editing, e.g. inserting or deleting · CPC title

  • Annotation, e.g. comment data or footnotes · CPC title

  • Presentation of query results · CPC title

  • Temporal data queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9594814B2 cover?
The technology disclosed relates to formulating and refining field extraction rules that are used at query time on raw data with a late-binding schema. The field extraction rules identify portions of the raw data, as well as their data types and hierarchical relationships. These extraction rules are executed against very large data sets not organized into relational structures that have not bee…
Who is the assignee on this patent?
Splunk Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/166. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 14 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).