Who is the assignee on this patent?

Thomson Reuters Global Resources

What technology area does this patent fall under?

Primary CPC classification G06F40/56. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 24 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for natural language generation

US9953031B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9953031-B2
Application number	US-201615211340-A
Country	US
Kind code	B2
Filing date	Jul 15, 2016
Priority date	Nov 29, 2012
Publication date	Apr 24, 2018
Grant date	Apr 24, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving a corpus comprising a set of pre-segmented texts. The method further includes creating a plurality of modified pre-segmented texts for the set of pre-segmented texts by extracting a set of semantic terms for each pre-segmented text within the set of pre-segmented texts and applying at least one domain tag for each pre-segmented text within the set of pre-segmented texts. The method further includes clustering the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the plurality of modified pre-segmented texts.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising: a. a processor; b. a memory coupled to the processor; c. a program stored in the memory for execution by the processor, the program configured to: i. identify, by a ranking module, a first statistically generated template matching a set of domain tags, the first statistically generated template being stored in a memory; ii. identify, by the ranking module, a second statistically generated template matching the set of domain tags, the second statistically generated template being stored in the memory; iii. select, by the ranking module, the first statistically generated template instead of the second statistically generated template based on ranking the first statistically generated template higher than the second statistically generated template using an automatically generated statistical ranking model, the automatically generated statistical ranking model being derived from a set of automatically generated model weights and a set of automatically generated ranking features; iv. generate, by the ranking module, a set of natural language text by inserting a set of information associated with a record into the first statistically generated template, the set of natural language text being stored in memory; and v. provide, by a delivery module, the set of natural language text. 2. The system of claim 1 wherein the program is further configured to, prior to the identifying by the ranking module: i. receive, by a template module, from a content database a corpus comprising a set of pre-segmented texts; ii. create, by the template module, a plurality of modified pre-segmented texts for the set of pre-segmented texts by: 1. extraction, by the template module, of a set of semantic terms for each pre-segmented text within the set of pre-segmented texts; and 2. application, by the template module, of at least one domain tag for each pre-segmented text within the set of pre-segmented texts; iii. cluster, by the template module, the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the set of pre-segmented texts, and wherein a conceptual unit identifier is assigned to each modified pre-segmented text and pre-segmented text in the plurality of modified pre-segmented texts and set of pre-segmented texts respectively; and iv. store the plurality of modified pre-segmented texts and the set of semantic terms in the content database. 3. The system of claim 2 wherein the statistical ranking model is adapted to select a sequence of templates that best fit input data, the statistical ranking model being learned by application of historical corpus data for a given domain including conceptual-unit creation and collecting statistics. 4. A professional services resource system for processing documents and delivering hybrid statistical/template based natural language generation services, the system comprising: a. a processor module comprising one or more processors; b. a memory coupled to the processor module; c. a content database comprising a set of documents, wherein each document comprises a set of pre-segmented texts; d. a natural language text generation module executable by the processor module and configured to: i. identify, by a ranking module, a first statistically generated template matching a set of domain tags, the first statistically generated template being stored in a memory; ii. identify, by the ranking module, a second statistically generated template matching the set of domain tags, the second statistically generated template being stored in the memory; iii. select, by the ranking module, the first statistically generated template instead of the second statistically generated template based on ranking the first statistically generated template higher than the second statistically generated template using an automatically generated statistical ranking model, the automatically generated statistical ranking model being derived from a set of automatically generated model weights and a set of automatically generated ranking features; iv. generate, by the ranking module, a set of natural language text by inserting a set of information associated with a record into the first statistically generated template, the set of natural language text being stored in memory; and v. provide, by a delivery module, the set of natural language text. 5. The system of claim 4 wherein the natural language text generation module is further configured to, prior to the identifying by the ranking module: i. receive, by a template module, from the content database a corpus comprising a set of pre-segmented texts; ii. create, by the template module, a plurality of modified pre-segmented texts by: 1. extraction, by the template module, of a set of semantic terms for each pre-segmented text within the set of pre-segmented texts; and 2. application, by the template module, of at least one domain tag for each pre-segmented text within the set of pre-segmented texts; iii. cluster, by the template module, the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the set of pre-segmented texts, and wherein a conceptual unit identifier is assigned to each modified pre-segmented text and pre-segmented text in the plurality of modified pre-segmented texts and set of pre-segmented texts respectively; and iv. store the plurality of modified pre-segmented texts and the set of semantic terms in the content database. 6. The system of claim 5 wherein the domain tags include domain specific tags and domain general tags. 7. The system of claim 4 wherein the natural language text generation module is further configured to, prior to the identifying by the ranking module: i. determine a gold template; ii. identify a set of matching templates from one or more templates within a given conceptual unit, the set of matching templates associated with the gold template; iii. rank the set of matching templates and associating a ranked set of matching templates with the gold template; and iv. determine the set of automatically generated model weights based on one or more ranked sets of matching templates and set of automatically generated ranking features; and v. store the plurality of modified pre-segmented texts, the set of automatically generated model weights, and the set of semantic terms in a content database. 8. The system of claim 7 wherein the set of automatically generated ranking features includes at least one of: a. a position of each pre-segmented text; b. a type and a number of a set of content; c. an n-gram calculation; d. a template length; and e. an overlap calculation between a current template and the gold template. 9. The system of claim 7 further comprising a training module adapted to calculate the set of automatically generated ranking features for each matching template in the ranked set of matching templates. 10. The system of claim 7 wherein the natural language text generation module determines a plurality of gold templates and a plurality of ranked sets of matching templates, and wherein a set of ranking features is calculated for each ranked set of matching templates. 11. The system of claim 10 wherein the set of ranking features comprises one or more of: (1) position in text as a proportion of the total text; (2) type and number of domain tags; (3) n-grams; (4) template

Assignees

Thomson Reuters Global Resources

Inventors

Classifications

G06F40/117
Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title
G06F40/56Primary
Natural language generation · CPC title
G06F40/30
Semantic analysis · CPC title
G06F17/2785
Physics · mapped topic
G06F17/218
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 53775058

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9953031B2 cover?: A method includes receiving a corpus comprising a set of pre-segmented texts. The method further includes creating a plurality of modified pre-segmented texts for the set of pre-segmented texts by extracting a set of semantic terms for each pre-segmented text within the set of pre-segmented texts and applying at least one domain tag for each pre-segmented text within the set of pre-segmented te…
Who is the assignee on this patent?: Thomson Reuters Global Resources
What technology area does this patent fall under?: Primary CPC classification G06F40/56. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 24 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).