Method and apparatus for generating elementary string sets for unit testing regular expressions

US9690688B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9690688-B2
Application numberUS-201313779898-A
CountryUS
Kind codeB2
Filing dateFeb 28, 2013
Priority dateFeb 28, 2013
Publication dateJun 27, 2017
Grant dateJun 27, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, apparatus, and computer program product are disclosed to generate elementary string sets for unit testing regular expressions. In the context of a method, a regular expression is received. The method also creates a deterministic finite automaton based on the regular expression. In addition, the method generates an elementary string set using the deterministic finite automaton. The elementary string is generated to test software that uses the regular expression.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a regular expression; creating a deterministic finite automaton (DFA) based on the regular expression; and using the DFA, by a processor, to generate an elementary string set, wherein each string of the elementary string set matches the regular expression, wherein using the DFA to generate the elementary string set comprises: adding a start state of the DFA to the state queue, adding an empty string to a string queue, and processing each state in the state queue, wherein processing a state in the state queue comprises: extracting the state from the state queue, extracting a string from the string queue, incrementing a counter associated with the state, updating a data structure to store the counter, in an instance in which the state is terminal, adding the extracted string to the elementary string set, retrieving a set of symbols that cause a transition from the state to another state in the DFA, and for each symbol of the set of symbols, updating the state queue and the string queue based on the symbol, wherein the data structure comprises a tree structure, each state being a key of the tree structure and the associated counter being a value of the tree structure associated with the key, and wherein (a) a second state of the plurality of states is defined by transitioning a first state of the state queue to the second state using a symbol, and (b) wherein the second state comprises no more than two instances of the symbol; and wherein (a) each string in the elementary string set will loop no more than twice while matching the regular expression, (b) the string set is maximal having no more than one string with same properties matching the regular expression, and (c) the elementary string set is generated to test software that uses the regular expression. 2. The method of claim 1 , wherein updating the state queue and the string queue based on the symbol comprises: determining a new state to which the symbol transitions the DFA; in an instance in which the counter in the data structure and associated with the new state is less than a predefined value: adding the new state to the state queue; and adding a new string to the string queue, the new string comprising the extracted string and the symbol. 3. The method of claim 1 , wherein the elementary string set includes every string that is represented by the regular expression and that does not include a symbol that repeats more than twice. 4. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: receive a regular expression; create a deterministic finite automaton (DFA) based on the regular expression; and use the DFA to generate an elementary string set, wherein each string of the elementary string set matches the regular expression, wherein using the DFA to generate the elementary string set comprises: adding a start state of the DFA to the state queue, adding an empty string to a string queue, and processing each state in the state queue, wherein processing a state in the state queue comprises: extracting the state from the state queue, extracting a string from the string queue, incrementing a counter associated with the state, updating a data structure to store the counter, in an instance in which the state is terminal, adding the extracted string to the elementary string set, retrieving a set of symbols that cause a transition from the state to another state in the DFA, and for each symbol of the set of symbols, updating the state queue and the string queue based on the symbol, wherein the data structure comprises a tree structure, each state being a key of the tree structure and the associated counter being a value of the tree structure associated with the key, and wherein (a) a second state of the plurality of states is defined by transitioning a first state of the state queue to the second state using a symbol, and (b) wherein the second state comprises no more than two instances of the symbol; and wherein (a) each string in the elementary string set will loop no more than twice while matching the regular expression, (b) the string set is maximal having no more than one string with the same properties matching the regular expression, and (c) the elementary string set is generated to test software that uses the regular expression. 5. The apparatus of claim 4 , wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to update the state queue and the string queue based on the symbol by: determining a new state to which the symbol transitions the DFA; in an instance in which the counter in the data structure and associated with the new state is less than a predefined value: adding the new state to the state queue; and adding a new string to the string queue, the new string comprising the extracted string and the symbol. 6. The apparatus of claim 4 , wherein no string in the elementary string set will include a symbol that repeats more than twice. 7. A computer program product comprising at least one non-transitory computer-readable storage medium having computer-executable program code portions stored therein, the computer-executable program code portions comprising program code instructions that, when executed, cause an apparatus to: receive a regular expression; create a deterministic finite automaton (DFA) based on the regular expression; and use the DFA to generate an elementary string, wherein each string of the elementary string set matches the regular expression, wherein using the DFA to generate the elementary string set comprises: adding a start state of the DFA to the state queue, adding an empty string to a string queue, and processing each state in the state queue, wherein processing a state in the state queue comprises: extracting the state from the state queue, extracting a string from the string queue, incrementing a counter associated with the state, updating a data structure to store the counter, in an instance in which the state is terminal, adding the extracted string to the elementary string set, retrieving a set of symbols that cause a transition from the state to another state in the DFA, and for each symbol of the set of symbols, updating the state queue and the string queue based on the symbol, wherein the data structure comprises a tree structure, each state being a key of the tree structure and the associated counter being a value of the tree structure associated with the key, and wherein (a) a second state of the plurality of states is defined by transitioning a first state of the state queue to the second state using a symbol, and (b) wherein the second state comprises no more than two instances of the symbol; wherein (a) each string in the elementary string set will loop no more than twice while matching the regular expression, (b) the string set is maximal having no more than one string with the same properties matching the regular expression, and (c) the elementary string set is generated to test software that uses the regular expression. 8. The computer program product of claim 7 , wherein the program code instructions that, when executed, cause an apparatus to update the state queue and the string queue based on the symbol comprise program code instructions that, when executed, cause the apparatus to: determine a new state to which the symbol transitions the DFA; in an instance in which the counter in the data structure and associated with the new state is less than a

Assignees

Inventors

Classifications

  • for test design, e.g. generating new test cases · CPC title

  • for coverage analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9690688B2 cover?
A method, apparatus, and computer program product are disclosed to generate elementary string sets for unit testing regular expressions. In the context of a method, a regular expression is received. The method also creates a deterministic finite automaton based on the regular expression. In addition, the method generates an elementary string set using the deterministic finite automaton. The ele…
Who is the assignee on this patent?
Here Global Bv
What technology area does this patent fall under?
Primary CPC classification G06F11/3684. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).