Regular expression generation for negative example using context

US11941018B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11941018-B2
Application numberUS-202016904298-A
CountryUS
Kind codeB2
Filing dateJun 17, 2020
Priority dateJun 13, 2018
Publication dateMar 26, 2024
Grant dateMar 26, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by the sets of regular expression codes and/or spans, and may generate a regular expression based upon the longest common subsequence. A negative example may be used to generate the regular expression. Context from the negative example may be determined in order to generate the regular expression.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of generating regular expressions, comprising: receiving, by a regular expression generator comprising one or more processors, a first selection comprising one or more positive character sequences in a first field of a data structure, each of the one or more positive character sequences corresponding to a positive example that is to be matched by a regular expression generated by the regular expression generator; after receiving the first selection, generating, by the regular expression generator, a first regular expression, wherein the first regular expression matches the positive example; in response to the generation of the first regular expression, displaying, by the regular expression generator, the first regular expression that is generated based on matching the positive example; receiving, by the regular expression generator, a second selection comprising one or more negative character sequences in a second field of the data structure, each of the one or more negative character sequences corresponding to a negative example that is not to be matched by the regular expression generated by the regular expression generator; in response to receiving the second selection, determining a context of the selected one or more negative character sequences in the second field of the data structure corresponding to the negative example; updating, in real-time, the first regular expression that was generated to match the positive example to include the context that was determined based on the one or more negative character sequences; and displaying, by the regular expression generator, the updated first regular expression. 2. The method according to claim 1 , wherein the receiving the first selection comprises receiving, via a user interface, a selection of the one or more positive character sequences in a first data cell of a data set. 3. The method according to claim 2 , further comprising automatically selecting, by the regular expression generator, character sequences in a plurality of data cells in the data set corresponding to the first selection comprising one or more positive character sequences. 4. The method according to claim 3 , wherein the receiving the second selection comprises receiving, via the user interface, the selection of the one or more negative character sequences in a second data cell of the data set. 5. The method according to claim 4 , further comprising automatically selecting, by the regular expression generator, character sequences in the plurality of data cells in the data set corresponding to the second selection comprising one or more negative character sequences. 6. The method according to claim 3 , wherein the first selection is highlighted in a first highlight format and the second selection is highlighted in a second highlight format that is different from the first highlight format. 7. The method according to claim 6 , wherein the determining the context of the one or more negative character sequences corresponding to the negative example comprises: identifying an embedded highlighting location of the second selection; determining context from data to a left of the embedded highlighting location of the second selection; and determining context from data to a right of the embedded highlighting location of the highlighted second selected. 8. The method according to claim 7 , wherein the determining the context of the one or more negative character sequences corresponding to the negative example further comprises: filtering the character sequences in the plurality of data cells in the data set corresponding to the first selection comprising the one or more negative character sequences that were automatically selected based on the determined context from data to the left of the embedded highlighting location and based on the determined context from data to the right of the embedded highlighting location; and removing the filtered character sequences from the selected character sequences in the plurality of data cells in the data set corresponding to the selected one or more negative character sequences. 9. The method according to claim 8 , wherein the determining the context from data to the left of an embedded highlighting location comprises identifying a first span to the left of the embedded highlighting location; and wherein filtering the character sequences in the plurality of data cells in the data set corresponding to the selected one or more negative character sequences further comprises identifying spans in the character sequences in the plurality of data cells corresponding to the selected one or more negative character sequences that do not match the first span to the left of the embedded highlighting location. 10. The method according to claim 9 , wherein the determining the context from data to the left of an embedded highlighting location further comprises identifying a second span to the left of the embedded highlighting; and wherein filtering the character sequences in the plurality of data cells in the data set corresponding to the selected one or more negative character sequences further comprises identifying spans in the character sequences in the plurality of data cells corresponding to the selected one or more negative character sequences that do not match the second span to the left of the embedded highlighting location. 11. The method according to claim 7 , wherein the determining the context from data to the right of an embedded highlighting location comprises identifying a first span to the right of the embedded highlighting location; and wherein filtering the character sequences in the plurality of data cells in the data set corresponding to the second selection comprising one or more negative character sequences further comprises identifying spans in the character sequences in the plurality of data cells corresponding to the second selection comprising one or more negative character sequences that do not match the first span to the right of the embedded highlighting location. 12. A regular expression generator server computer comprising: a processor; a memory; a computer readable medium coupled to the processor, the computer readable medium storing instructions executable by the processor for implementing a method comprising: receiving, by a regular expression generator comprising one or more processors, a first selection comprising one or more positive character sequences in a first field of a data structure, each of the one or more positive character sequences corresponding to a positive example that is to be matched by a regular expression generated by the regular expression generator; after receiving the first selection, generating, by the regular expression generator, a first regular expression, wherein the first regular expression matches the positive example; in response to the generation of the first regular expression, displaying, by the regular expression generator, the first regular expression that is generated based on matching the positive example; receiving, by the regular expression generator, a second selection comprising one or more negative character sequences in a second field of the data structure, each of the one or more negative character sequences corresponding to a negative example that is not to be matched by the regular expression generated by the regular expression generator; in response to receiving the second selection, determining a context of the selected one or more negative character sequences in the second field of the data structure corresponding to the negative example; updating, in real-time, the first regular expression that was g

Assignees

Inventors

Classifications

  • G06F16/258Primary

    Data format conversion from or to a database · CPC title

  • Inference or reasoning models · CPC title

  • Translation of natural language queries to structured queries · CPC title

  • Natural language query formulation · CPC title

  • Standardisation; Simplification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11941018B2 cover?
Techniques for generated regular expressions are disclosed. In some embodiments, a regular expression generator may receive input data comprising one or more character sequences. The regular expression generator may convert character sequences into a sets of regular expression codes and/or span data structures. The regular expression generator may identify a longest common subsequence shared by…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F16/258. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 26 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).