What technology area does this patent fall under?

Primary CPC classification G06F40/56. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system

US10706236B1 · US · B1

Patent metadata
Field	Value
Publication number	US-10706236-B1
Application number	US-201916444649-A
Country	US
Kind code	B1
Filing date	Jun 18, 2019
Priority date	Jun 28, 2018
Publication date	Jul 7, 2020
Grant date	Jul 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Applied Artificial Intelligence Technology for Using Natural Language Processing and Concept Expression Templates To Train a Natural Language Generation System Disclosed herein is computer technology that applies natural language processing (NLP) techniques to training data to generate information used to train a natural language generation (NLG) system to produce output that stylistically resembles the training data. In this fashion, the NLG system can be readily trained with training data supplied by a user so that the NLG system is adapted to produce output that stylistically resembles such training data. In an example, an NLP system detects a plurality of linguistic features in the training data. These detected linguistic features are then aggregated into a specification data structure that is arranged for training the NLG system to produce natural language output that stylistically resembles the training data. Parameters in the specification data structure can be linked to objects in an ontology used by the NLG system to facilitate the training of the NLG system based on the detected linguistic features.

First claim

Opening claim text (preview).

What is claimed is: 1. A natural language processing method comprising: a processor performing natural language processing (NLP) on training data to detect a plurality of linguistic features in the training data, wherein the training data comprises a plurality of words arranged as a sentence in a natural language, and wherein the detected linguistic features include a concept expression template that models how the training data describes a defined concept, wherein the performing step comprises: the processor comparing the sentence with a plurality of anchor words to determine whether any of the anchor words are present in the sentence, each anchor word having an association with a concept; and in response to a determination that an anchor word is present in the sentence, the processor analyzing the sentence to extract an expression template for the concept associated with the present anchor word, wherein the analyzing step comprises: the processor parsing the sentence to generate a parse tree structure for the sentence; the processor identifying entities in the parse tree structure; the processor pruning the parse tree structure by removing clauses and phrases from the parse tree structure that do not contain identified entities; the processor collapsing branches of the pruned parse tree structure based on which of the branches contain identified entities; and the processor parameterizing the collapsed and pruned parse tree structure to create the concept expression template, wherein the concept expression template comprises enumerated variables in place of identified entities; and the processor generating a specification data structure based on the detected linguistic features, the specification data structure arranged for training a natural language generation (NLG) system to produce natural language output that stylistically resembles the training data. 2. The method of claim 1 wherein the specification data structure comprises a machine-readable representation of the detected linguistic features. 3. The method of claim 1 further comprising: training the NLG system based on the specification data structure to thereby configure the NLG system to produce natural language output that stylistically resembles the training data. 4. The method of claim 3 further comprising: the trained NLG system processing a data set to generate natural language output, wherein the natural language output includes an expression that is derived from the detected linguistic features. 5. The method of claim 1 wherein the concept expression template comprises a change concept expression template. 6. The method of claim 1 wherein the concept expression template comprises a compare concept expression template. 7. The method of claim 1 wherein the concept expression template comprises a driver concept expression template. 8. The method of claim 1 wherein the concept expression template comprises a rank concept expression template. 9. The method of claim 1 wherein the identifying step comprises the processor performing named entity recognition on words in the parse tree structure based on an ontology shared with the NLG system. 10. The method of claim 9 wherein the enumerated variables in the concept expression template are linked to objects in the ontology. 11. The method of claim 1 wherein the analyzing step further comprises the processor validating the concept expression template based on a plurality of rules. 12. The method of claim 1 wherein the parsing step comprises the processor constituency parsing and dependency parsing the sentence to generate the parse tree structure. 13. The method of claim 1 further comprising: the processor modifying the specification data structure to selectively choose in response to user input which of the detected linguistic features are to be used for training the NLG system. 14. The method of claim 13 further comprising: providing a user interface for presentation to a user, the user interface configured to summarize the detected linguistic features; and the processor receiving user input through the user interface, wherein the received user input includes commands that identify which of the detected linguistic features are to be used to train the NLG system. 15. The method of claim 1 further comprising: receiving the training data as text sentence input from a user. 16. The method of claim 1 further comprising: receiving the training data as a pre-existing document. 17. The method of claim 1 further comprising: receiving the training data as speech input from a user. 18. The method of claim 1 wherein the training data comprises a corpus of documents. 19. The method of claim 1 wherein the training data comprises a plurality of sentences, the method further comprising the processor performing the NLP on each of a plurality of the sentences to detect a plurality of linguistic features in the sentences. 20. The method of claim 1 wherein the processor comprises a single processor. 21. The method of claim 1 wherein the processor comprises a plurality of processors. 22. An apparatus for natural language processing, the apparatus comprising: a processor configured to (1) perform natural language processing (NLP) on training data to detect a plurality of linguistic features in the training data, wherein the training data comprises a plurality of words arranged as a sentence in a natural language, and wherein the detected linguistics feature include a concept expression template that models how the training data describes a defined concept, and (2) generate a specification data structure based on the detected linguistic features, the specification data structure arranged for training a natural language generation (NLG) system to produce natural language output that stylistically resembles the training data; wherein the processor is further configured to, as part of the NLP to detect the linguistic features, (1) compare the sentence with a plurality of anchor words to determine whether any of the anchor words are present in the sentence, each anchor word having an association with a concept, and (2) in response to a determination that an anchor word is present in the sentence, analyze the sentence to extract an expression template for the concept associated with the present anchor word; and wherein the processor is further configured to, as part of the sentence analysis to extract the concept expression template, (1) parse the sentence to generate a parse tree structure for the sentence, (2) identify entities in the parse tree structure, (3) prune the parse tree structure by removing clauses and phrases from the parse tree structure that do not contain identified entities, (4) collapse branches of the pruned parse tree structure based on which of the branches contain identified entities, and (5) parameterize the collapsed and pruned parse tree structure to create the concept expression template, wherein the concept expression template comprises enumerated variables in place of identified entities. 23. The apparatus of claim 22 wherein the processor is further configured to train the NLG system based on the specification data structure to thereby configure the NLG system to produce natural language output that stylistically resembles the training data. 24. The apparatus of claim 23 wherein the specification data structure comprises a machine-readable representation of the detected linguist

Assignees

Narrative Science Inc

Inventors

Classifications

G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
G06F16/3344
using natural language analysis · CPC title
G06F40/253
Grammatical analysis; Style critique · CPC title
G06N5/02
Knowledge representation; Symbolic representation · CPC title

Patent family

Related publications grouped by family.

View patent family 71408486

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10706236B1 cover?: Applied Artificial Intelligence Technology for Using Natural Language Processing and Concept Expression Templates To Train a Natural Language Generation System Disclosed herein is computer technology that applies natural language processing (NLP) techniques to training data to generate information used to train a natural language generation (NLG) system to produce output that stylistically rese…
Who is the assignee on this patent?: Narrative Science Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/56. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).