Classification by optimizing regular expression rules

US11645337B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11645337-B2
Application numberUS-202016879813-A
CountryUS
Kind codeB2
Filing dateMay 21, 2020
Priority dateMay 21, 2020
Publication dateMay 9, 2023
Grant dateMay 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-based system and method for optimizing execution of regular expression rules, each including one or more sub-rules, may include: testing, by a processor, the sub-rules against a data sample; measuring, by a processor and based on the testing, the probability for every sub-rule that it appears in the data sample, and the processing time of each sub-rule; and finding, by a processor, an order of execution of at least a subset of the sub-rules to shorten the total execution time of validating the regular expression rules, based to the probability and the execution time of each of the sub-rules.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for optimizing execution of regular expression rules, each comprising a plurality of sub-rules, the method comprising: testing, by a processor, the sub-rules in a data sample that includes a plurality of regular expressions; measuring, by a processor and based on the testing, a probability for every sub-rule that it appears in the data sample, and the execution time of each sub-rule; and finding, by a processor, an order of execution of at least a subset of the sub-rules to shorten the total execution time of validating the regular expression rules, based on a probability of co-occurrence of at least one sub-rule among different regular expressions and the execution time of the sub-rules, wherein a sub-rule that requires short processing time or lower processing power is prioritized over a sub-rule that requires longer processing time or higher processing power, a sub-rule that rarely appears or matches in the data sample is prioritized over a sub-rule that appears or matches in the data sample more frequently, a sub-rule that appears in a regular expression rule that requires longer processing time or higher processing power is prioritized over a sub-rule that appears in a regular expression rule that requires shorter processing time or lower processing power, and a sub—rule that repeats in many regular expression rules is prioritized over a sub-rule that repeats in fewer regular expression rules. 2. The method of claim 1 , wherein finding the order of execution of the subset of sub-rules is performed according to frequency of the subset of sub-rules, wherein the frequency of a sub-rule equals the number of regular expression rules that include the sub-rule. 3. The method of claim 2 , comprising: measuring, by a processor and based on the testing, execution time of the regular expression rules, wherein finding the order of execution of the subset of sub-rules is performed according to the execution time of the regular expression rules. 4. The method of claim 1 , wherein finding the order of executing the subset of sub-rules is performed by solving a multidimensional dynamic programming problem. 5. The method of claim 1 , comprising: executing the sub-rules to classify examined data, wherein the subset of sub-rules are executed according to the order of execution. 6. The method of claim 5 , wherein the sub-rules are executed using nondeterministic finite automaton. 7. The method of claim 5 , wherein the examined data pertains to a computer database and the data sample comprises a portion of the database. 8. The method of claim 1 , wherein testing the sub-rules in the data sample comprises executing the sub-rules against the data sample. 9. The method of claim 1 , wherein the regular expression rules comprise sequence of characters that define a search pattern for finding a pattern of data. 10. The method of claim 1 , comprising, finding a new order of execution if the probability of the subset of sub-rules that they appear in the examined data is significantly different from the probability of the subset of sub-rules that they appear in the sample data. 11. A computer implemented method for optimizing execution of regular expressions, each comprising a plurality of sections, the method comprising: executing, by a processor, the sections in a data sample that includes the regular expressions; measuring, by a processor, a probability of matches for each section of a portion of the sections in a data sample, processing time of every section of the portion of the sections, and processing time of every regular expression; and solving, by a processor, an optimization problem to find an order of executing the sections of the portion of the sections to minimize the total run time, according to a probability of co-occurrence of at least one section among different regular expressions, the processing time of each of the sections of the portion of sections, the processing time of the regular expressions, and the number of regular expression rules that include each of the sections of the portion of sections, wherein a section that requires short processing time or lower processing power is prioritized over a section that requires longer processing time or higher processing power, a section that rarely appears or matches in the data sample is prioritized over a section that appears or matches in the data sample more frequently, a section that appears in a regular expression rule that requires longer processing time or higher processing power is prioritized over a section that appears in a regular expression rule that requires shorter processing time or lower processing power, and a section that repeats in many regular expression rules is prioritized over a section that repeats in fewer regular expression rules. 12. A system for optimizing execution of regular expression rules, each comprising a plurality of sub-rules, the system comprising: a memory; and a processor configured to: test the sub-rules in a data sample that includes regular expressions; measure, based on the testing, the probability for every sub-rule that it appears in the data sample, and the execution time of each sub-rule; and find an order of execution of at least a subset of the sub-rules to shorten the total execution time of validating the regular expression rules based on a probability of co-occurrence of at least one sub-rule among different regular expressions and the execution time of each of the sub-rules, wherein a subset of the sub-rules that requires short processing time or lower processing power is prioritized over a subset of the sub-rules that requires longer processing time or higher processing power, a subset of the sub-rules that rarely appears or matches in the data sample is prioritized over a subset of the sub-rules that appears or matches in the data sample more frequently, a subset of the sub-males that appears in a regular expression rule that requires longer processing time or higher processing power is prioritized over a subset of the sub-rules that appears in a regular expression rule that requires shorter processing time or lower processing power, and a subset of the sub-rules that repeats in many regular expression rules is prioritized over a subset of the sub-rules that repeats in fewer regular expression rules. 13. The system of claim 12 , wherein the processor is configured to find the order of execution of the sub-rules according to frequency of the sub-rules, wherein the frequency of a sub-rule equals the number of regular expression rules that include the sub-rule. 14. The system of claim 12 , wherein the processor is configured to: measure execution time of the regular expression rules; and find the order of execution of the subset of sub-rules according to the execution time of the regular expression rules. 15. The system of claim 12 , wherein the processor is configured to find the order of executing the subset of sub-rules by solving a multidimensional dynamic programming problem. 16. The system of claim 12 , wherein the processor is configured to: execute the sub-rules to classify examined data, wherein the subset of sub-rules are executed according to the order of execution. 17. The system of claim 15 , wherein the processor is configured to execute the sub-rules by using nondeterministic finite automaton. 18. The system of claim 15 , wherein the examined data pertains to a computer database and the data sample comprises a portion of the database. 19. The system of claim 12

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • by using string matching techniques · CPC title

  • based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title

  • Forward inferencing; Production systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645337B2 cover?
A computer-based system and method for optimizing execution of regular expression rules, each including one or more sub-rules, may include: testing, by a processor, the sub-rules against a data sample; measuring, by a processor and based on the testing, the probability for every sub-rule that it appears in the data sample, and the processing time of each sub-rule; and finding, by a processor, a…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/90344. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).