Syntactic classification of natural language sentences with respect to a targeted element

US10133724B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10133724-B2
Application numberUS-201615242779-A
CountryUS
Kind codeB2
Filing dateAug 22, 2016
Priority dateAug 22, 2016
Publication dateNov 20, 2018
Grant dateNov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A grammatically diverse test set of natural language sentences for a deep question answering system is provided by analyzing a given sentence to characterize its syntactical classification, and adding the sentence to the test set if its classification is sufficiently different from other sentences already in the test set. A particular sentence may be selected for inclusion according to a desired syntactic distribution. Multiple sentences having the exact same classification may be allowed subject to a maximum number of such sentences. The test set is adapted to an element of interest by characterizing each syntactical classification relative to the element of interest. The analysis derives a parse tree, identifies a particular node of the tree corresponding to the element of interest, and extracts syntactic information by traversing the tree starting at the particular node and ending at the root node of the tree according to different traversal schemes.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of syntactically classifying a natural language sentence comprising: receiving the natural language sentence in computer-readable form, by executing first instructions in a computer system; parsing the natural language sentence to derive a parse tree having a plurality of nodes, by executing second instructions in the computer system; identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, by executing third instructions in the computer system; extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, by executing fourth instructions in the computer system; recording the syntactic information as a classification for the natural language sentence, by executing fifth instructions in the computer system; determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information, by executing sixth instructions in the computer system, wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications; and responsive to said determining, adding the natural language sentence to the test set, by executing seventh instructions in the computer system. 2. The method of claim 1 wherein: the parse tree nodes include a root node, one or more interior nodes, and a plurality of terminal nodes representing linguistic elements of the natural language sentence, the particular node corresponding to the element of interest being one of the terminal nodes; each of the parse tree nodes has an associated linguistic identifier; said extracting includes traversing the parse tree along a traversal path starting at the particular terminal node corresponding to the element of interest and ending at the root node; and the syntactic information includes a sequence of linguistic identifiers associated with respective nodes of the traversal path in order of traversal. 3. The method of claim 2 wherein: the parse tree includes a first node having a linguistic identifier with a semantic argument numeric index, and includes a second node associated with a linguistic element of the natural language sentence having a semantic argument corresponding to the numeric index; and the traversal path includes a discontinuous jump of the parse tree from the first node directly to the second node. 4. The method of claim 2 wherein: the parse tree includes a first node corresponding to a linguistic element of the natural language sentence which is a form of the verb “to be” and has a semantic role, and includes a second node associated with a linguistic element of the natural language sentence having a semantic argument index corresponding to the semantic role; and the traversal path includes a discontinuous jump of the parse tree from the first node directly to the second node. 5. The method of claim 1 wherein the element of interest is an interrogative element. 6. The method of claim 1 wherein the particular node corresponding to the element of interest is identified by scanning the parse-tree for the element in a top-down, left-to-right, depth-first order. 7. The method of claim 1 wherein the predetermined similarity criterion allows the two given sentences to be deemed similar when their respective classifications are comprised of sequentialization labels having no more than a known number of different elements. 8. A computer system comprising: one or more processors which process program instructions; a memory device connected to said one or more processors; and program instructions residing in said memory device for syntactically classifying a natural language sentence by receiving the natural language sentence, parsing the natural language sentence to derive a parse tree having a plurality of nodes, identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, extracting syntactic information from the parse tree relative to the particular node corresponding to the element of interest, recording the syntactic information as a classification for the natural language sentence, determining that the classification for the natural language sentence is different from classifications of other natural language sentences in a test set according to at least one predetermined similarity criterion related to the syntactic information wherein the predetermined similarity criterion allows two given sentences to be deemed similar even when the two given sentences have different classifications, and responsively adding the natural language sentence to the test set. 9. The computer system of claim 8 wherein: the parse tree nodes include a root node, one or more interior nodes, and a plurality of terminal nodes representing linguistic elements of the natural language sentence, the particular node corresponding to the element of interest being one of the terminal nodes; each of the parse tree nodes has an associated linguistic identifier; the extracting includes traversing the parse tree along a traversal path starting at the particular terminal node corresponding to the element of interest and ending at the root node; and the syntactic information includes a sequence of linguistic identifiers associated with respective nodes of the traversal path in order of traversal. 10. The computer system of claim 9 wherein: the parse tree includes a first node having a linguistic identifier with a semantic argument numeric index, and includes a second node associated with a linguistic element of the natural language sentence having a semantic argument corresponding to the numeric index; and the traversal path includes a discontinuous jump of the parse tree from the first node directly to the second node. 11. The computer system of claim 9 wherein: the parse tree includes a first node corresponding to a linguistic element of the natural language sentence which is a form of the verb “to be” and has a semantic role, and includes a second node associated with a linguistic element of the natural language sentence having a semantic argument index corresponding to the semantic role; and the traversal path includes a discontinuous jump of the parse tree from the first node directly to the second node. 12. The computer system of claim 8 wherein the element of interest is an interrogative element. 13. The computer system of claim 8 wherein the particular node corresponding to the element of interest is identified by scanning the parse-tree for the element in a top-down, left-to-right, depth-first order. 14. The computer system of claim 8 wherein the predetermined similarity criterion allows the two given sentences to be deemed similar when their respective classifications are comprised of sequentialization labels having no more than a known number of different elements. 15. A computer program product comprising: a computer readable storage medium; and program instructions residing in said storage medium for syntactically classifying a natural language sentence by receiving the natural language sentence, parsing the natural language sentence to derive a parse tree having a plurality of nodes, identifying a particular one of the nodes that corresponds to an element of interest in the natural language sentence, extracting syntactic information from the parse tree relative to the

Assignees

Inventors

Classifications

  • Trees · CPC title

  • Tree-structured documents (parsing G06F40/205; validation G06F40/226) · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • using natural language analysis · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133724B2 cover?
A grammatically diverse test set of natural language sentences for a deep question answering system is provided by analyzing a given sentence to characterize its syntactical classification, and adding the sentence to the test set if its classification is sufficiently different from other sentences already in the test set. A particular sentence may be selected for inclusion according to a desire…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).