What technology area does this patent fall under?

Primary CPC classification G06F40/253. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Generating natural language text sentences as test cases for NLP annotators with combinatorial test design

US9606980B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9606980-B2
Application number	US-201414572691-A
Country	US
Kind code	B2
Filing date	Dec 16, 2014
Priority date	Dec 16, 2014
Publication date	Mar 28, 2017
Grant date	Mar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the grammar constructs. Additional test cases are generated by replacing starting phrases in selected fragments with substitute phrases from dictionaries associated with the input types (a dictionary can include a false synonym for an input type for purposes of negative testing). The two generating approaches can be combined, i.e., performing one or more successive (different) grammar tree transformations to yield a sentence which is then subjected to phrase substitution.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of generating test cases for a text annotator which searches text documents and analyzes them relative to a defined set of tags comprising: receiving a corpus of text fragments without any annotations and a description of the text annotator, by executing first instructions in a computer system; determining types of inputs to the text annotator from the description, the types of inputs including at least one phrase selected from the group consisting of a person phrase, a date phrase, and a diagnosis phrase, by executing second instructions in the computer system; analyzing language structures in the corpus to identify sentence types and grammar constructs, the sentence types including at least one sentence selected from the group consisting of a question, a command, a compound sentence, and a conditional sentence, and wherein said analyzing includes performing a slot grammar parse of the corpus to determine various parse trees of the corpus including a most common parse tree, by executing third instructions in the computer system; generating a first test case by performing a grammar tree transformation on a first selected fragment of the corpus based on the sentence types and the grammar constructs wherein the first selected fragment is selected in response to a selection bias towards a sentence type which corresponds to the most common parse tree of the corpus, by executing fourth instructions in the computer system; and generating a second test case by replacing at least one starting phrase in the first test case with a substitute phrase from at least one dictionary associated with one of the types of inputs that corresponds to the starting phrase, by executing fifth instructions in the computer system. 2. The method of claim 1 wherein the first test case is generated by performing a sequence of different successive grammar tree transformations starting with the first selected fragment. 3. The method of claim 1 wherein the second test case is generated by replacing multiple starting phrases in the second selected fragment with respective substitute phrases from multiple dictionaries associated with different ones of the types of inputs that correspond to the multiple starting phrases. 4. The method of claim 1 wherein at least one of the types of inputs corresponds to multiple grammar constructs. 5. The method of claim 1 wherein the dictionary includes a false synonym for the one input type that corresponds to the starting phrase. 6. The method of claim 1 further comprising testing the text annotator using the first and second test cases. 7. A computer system comprising: one or more processors which process program instructions; a memory device connected to said one or more processors; and program instructions residing in said memory device for generating test cases for a text annotator which searches text documents and analyzes them relative to a defined set of tags by receiving a corpus of text fragments without any annotations and a description of the text annotator, determining types of inputs to the text annotator from the description wherein the types of inputs include at least one phrase selected from the group consisting of a person phrase, a date phrase, and a diagnosis phrase, analyzing language structures in the corpus to identify sentence types and grammar constructs wherein the sentence types include at least one sentence selected from the group consisting of a question, a command, a compound sentence, and a conditional sentence, and the analyzing includes performing a slot grammar parse of the corpus to determine various parse trees of the corpus including a most common parse tree, generating a first test case by performing a grammar tree transformation on a first selected fragment of the corpus based on the sentence types and the grammar constructs wherein the first selected fragment is selected in response to a selection bias towards a sentence type which corresponds to the most common parse tree of the corpus, and generating a second test case by replacing at least one starting phrase in the first test case with a substitute phrase from at least one dictionary associated with one of the types of inputs that corresponds to the starting phrase. 8. The computer system of claim 7 wherein the first test case is generated by performing a sequence of different successive grammar tree transformations starting with the first selected fragment. 9. The computer system of claim 7 wherein the second test case is generated by replacing multiple starting phrases in the second selected fragment with respective substitute phrases from multiple dictionaries associated with different ones of the types of inputs that correspond to the multiple starting phrases. 10. The computer system of claim 7 wherein at least one of the types of inputs corresponds to multiple grammar constructs. 11. The computer system of claim 7 wherein the dictionary includes a false synonym for the one input type that corresponds to the starting phrase. 12. The computer system of claim 7 wherein said program instructions further test the text annotator using the first and second test cases. 13. A computer program product comprising: a computer readable storage medium; and program instructions residing in said storage medium for generating test cases for a text annotator which searches text documents and analyzes them relative to a defined set of tags by receiving a corpus of text fragments without any annotations and a description of the text annotator, determining types of inputs to the text annotator from the description wherein the types of inputs include at least one phrase selected from the group consisting of a person phrase, a date phrase, and a diagnosis phrase, analyzing language structures in the corpus to identify sentence types and grammar constructs wherein the sentence types include at least one sentence selected from the group consisting of a question, a command, a compound sentence, and a conditional sentence, and the analyzing includes performing a slot grammar parse of the corpus to determine various parse trees of the corpus including a most common parse tree, generating a first test case by performing a grammar tree transformation on a first selected fragment of the corpus based on the sentence types and the grammar constructs wherein the first selected fragment is selected in response to a selection bias towards a sentence type which corresponds to the most common parse tree of the corpus, and generating a second test case by replacing at least one starting phrase in the first test case with a substitute phrase from at least one dictionary associated with one of the types of inputs that corresponds to the starting phrase. 14. The computer program product of claim 13 wherein the first test case is generated by performing a sequence of different successive grammar tree transformations starting with the first selected fragment. 15. The computer program product of claim 13 wherein the second test case is generated by replacing multiple starting phrases in the second selected fragment with respective substitute phrases from multiple dictionaries associated with different ones of the types of inputs that correspond to the multiple starting phrases. 16. The computer program product of claim 13 wherein at least one of the types of inputs corresponds to multiple grammar constructs. 17. The computer program product of claim 13 wherein the dictionary includes a false synonym for the one input type that corresponds to the starting phrase. 18

Assignees

Inventors

Classifications

G16H15/00
ICT specially adapted for medical reports, e.g. generation or transmission thereof · CPC title
G06F40/56
Natural language generation · CPC title
G06F40/169
Annotation, e.g. comment data or footnotes · CPC title
G06F40/16
Automatic learning of transformation rules, e.g. from examples · CPC title
G06F40/253Primary
Grammatical analysis; Style critique · CPC title

Patent family

Related publications grouped by family.

View patent family 56111332

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9606980B2 cover?: Test cases for a text annotator are generated by determining types of inputs to the annotator and analyzing language structures in a corpus to identify sentence types and grammar constructs. An input type can correspond to multiple grammar constructs. Test cases are generated by performing grammar tree transformations on selected fragments from the corpus based on the sentence types and the gra…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/253. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).