Generating Written Content from Knowledge Management Systems
US-2015347901-A1 · Dec 3, 2015 · US
US10706236B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10706236-B1 |
| Application number | US-201916444649-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 18, 2019 |
| Priority date | Jun 28, 2018 |
| Publication date | Jul 7, 2020 |
| Grant date | Jul 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Applied Artificial Intelligence Technology for Using Natural Language Processing and Concept Expression Templates To Train a Natural Language Generation System Disclosed herein is computer technology that applies natural language processing (NLP) techniques to training data to generate information used to train a natural language generation (NLG) system to produce output that stylistically resembles the training data. In this fashion, the NLG system can be readily trained with training data supplied by a user so that the NLG system is adapted to produce output that stylistically resembles such training data. In an example, an NLP system detects a plurality of linguistic features in the training data. These detected linguistic features are then aggregated into a specification data structure that is arranged for training the NLG system to produce natural language output that stylistically resembles the training data. Parameters in the specification data structure can be linked to objects in an ontology used by the NLG system to facilitate the training of the NLG system based on the detected linguistic features.
Opening claim text (preview).
What is claimed is: 1. A natural language processing method comprising: a processor performing natural language processing (NLP) on training data to detect a plurality of linguistic features in the training data, wherein the training data comprises a plurality of words arranged as a sentence in a natural language, and wherein the detected linguistic features include a concept expression template that models how the training data describes a defined concept, wherein the performing step comprises: the processor comparing the sentence with a plurality of anchor words to determine whether any of the anchor words are present in the sentence, each anchor word having an association with a concept; and in response to a determination that an anchor word is present in the sentence, the processor analyzing the sentence to extract an expression template for the concept associated with the present anchor word, wherein the analyzing step comprises: the processor parsing the sentence to generate a parse tree structure for the sentence; the processor identifying entities in the parse tree structure; the processor pruning the parse tree structure by removing clauses and phrases from the parse tree structure that do not contain identified entities; the processor collapsing branches of the pruned parse tree structure based on which of the branches contain identified entities; and the processor parameterizing the collapsed and pruned parse tree structure to create the concept expression template, wherein the concept expression template comprises enumerated variables in place of identified entities; and the processor generating a specification data structure based on the detected linguistic features, the specification data structure arranged for training a natural language generation (NLG) system to produce natural language output that stylistically resembles the training data. 2. The method of claim 1 wherein the specification data structure comprises a machine-readable representation of the detected linguistic features. 3. The method of claim 1 further comprising: training the NLG system based on the specification data structure to thereby configure the NLG system to produce natural language output that stylistically resembles the training data. 4. The method of claim 3 further comprising: the trained NLG system processing a data set to generate natural language output, wherein the natural language output includes an expression that is derived from the detected linguistic features. 5. The method of claim 1 wherein the concept expression template comprises a change concept expression template. 6. The method of claim 1 wherein the concept expression template comprises a compare concept expression template. 7. The method of claim 1 wherein the concept expression template comprises a driver concept expression template. 8. The method of claim 1 wherein the concept expression template comprises a rank concept expression template. 9. The method of claim 1 wherein the identifying step comprises the processor performing named entity recognition on words in the parse tree structure based on an ontology shared with the NLG system. 10. The method of claim 9 wherein the enumerated variables in the concept expression template are linked to objects in the ontology. 11. The method of claim 1 wherein the analyzing step further comprises the processor validating the concept expression template based on a plurality of rules. 12. The method of claim 1 wherein the parsing step comprises the processor constituency parsing and dependency parsing the sentence to generate the parse tree structure. 13. The method of claim 1 further comprising: the processor modifying the specification data structure to selectively choose in response to user input which of the detected linguistic features are to be used for training the NLG system. 14. The method of claim 13 further comprising: providing a user interface for presentation to a user, the user interface configured to summarize the detected linguistic features; and the processor receiving user input through the user interface, wherein the received user input includes commands that identify which of the detected linguistic features are to be used to train the NLG system. 15. The method of claim 1 further comprising: receiving the training data as text sentence input from a user. 16. The method of claim 1 further comprising: receiving the training data as a pre-existing document. 17. The method of claim 1 further comprising: receiving the training data as speech input from a user. 18. The method of claim 1 wherein the training data comprises a corpus of documents. 19. The method of claim 1 wherein the training data comprises a plurality of sentences, the method further comprising the processor performing the NLP on each of a plurality of the sentences to detect a plurality of linguistic features in the sentences. 20. The method of claim 1 wherein the processor comprises a single processor. 21. The method of claim 1 wherein the processor comprises a plurality of processors. 22. An apparatus for natural language processing, the apparatus comprising: a processor configured to (1) perform natural language processing (NLP) on training data to detect a plurality of linguistic features in the training data, wherein the training data comprises a plurality of words arranged as a sentence in a natural language, and wherein the detected linguistics feature include a concept expression template that models how the training data describes a defined concept, and (2) generate a specification data structure based on the detected linguistic features, the specification data structure arranged for training a natural language generation (NLG) system to produce natural language output that stylistically resembles the training data; wherein the processor is further configured to, as part of the NLP to detect the linguistic features, (1) compare the sentence with a plurality of anchor words to determine whether any of the anchor words are present in the sentence, each anchor word having an association with a concept, and (2) in response to a determination that an anchor word is present in the sentence, analyze the sentence to extract an expression template for the concept associated with the present anchor word; and wherein the processor is further configured to, as part of the sentence analysis to extract the concept expression template, (1) parse the sentence to generate a parse tree structure for the sentence, (2) identify entities in the parse tree structure, (3) prune the parse tree structure by removing clauses and phrases from the parse tree structure that do not contain identified entities, (4) collapse branches of the pruned parse tree structure based on which of the branches contain identified entities, and (5) parameterize the collapsed and pruned parse tree structure to create the concept expression template, wherein the concept expression template comprises enumerated variables in place of identified entities. 23. The apparatus of claim 22 wherein the processor is further configured to train the NLG system based on the specification data structure to thereby configure the NLG system to produce natural language output that stylistically resembles the training data. 24. The apparatus of claim 23 wherein the specification data structure comprises a machine-readable representation of the detected linguist
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
using natural language analysis · CPC title
Grammatical analysis; Style critique · CPC title
Knowledge representation; Symbolic representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.