Systems and methods for generating scripts to interact with web sites
US-9071592-B1 · Jun 30, 2015 · US
US9953031B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9953031-B2 |
| Application number | US-201615211340-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 15, 2016 |
| Priority date | Nov 29, 2012 |
| Publication date | Apr 24, 2018 |
| Grant date | Apr 24, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes receiving a corpus comprising a set of pre-segmented texts. The method further includes creating a plurality of modified pre-segmented texts for the set of pre-segmented texts by extracting a set of semantic terms for each pre-segmented text within the set of pre-segmented texts and applying at least one domain tag for each pre-segmented text within the set of pre-segmented texts. The method further includes clustering the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the plurality of modified pre-segmented texts.
Opening claim text (preview).
The invention claimed is: 1. A system comprising: a. a processor; b. a memory coupled to the processor; c. a program stored in the memory for execution by the processor, the program configured to: i. identify, by a ranking module, a first statistically generated template matching a set of domain tags, the first statistically generated template being stored in a memory; ii. identify, by the ranking module, a second statistically generated template matching the set of domain tags, the second statistically generated template being stored in the memory; iii. select, by the ranking module, the first statistically generated template instead of the second statistically generated template based on ranking the first statistically generated template higher than the second statistically generated template using an automatically generated statistical ranking model, the automatically generated statistical ranking model being derived from a set of automatically generated model weights and a set of automatically generated ranking features; iv. generate, by the ranking module, a set of natural language text by inserting a set of information associated with a record into the first statistically generated template, the set of natural language text being stored in memory; and v. provide, by a delivery module, the set of natural language text. 2. The system of claim 1 wherein the program is further configured to, prior to the identifying by the ranking module: i. receive, by a template module, from a content database a corpus comprising a set of pre-segmented texts; ii. create, by the template module, a plurality of modified pre-segmented texts for the set of pre-segmented texts by: 1. extraction, by the template module, of a set of semantic terms for each pre-segmented text within the set of pre-segmented texts; and 2. application, by the template module, of at least one domain tag for each pre-segmented text within the set of pre-segmented texts; iii. cluster, by the template module, the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the set of pre-segmented texts, and wherein a conceptual unit identifier is assigned to each modified pre-segmented text and pre-segmented text in the plurality of modified pre-segmented texts and set of pre-segmented texts respectively; and iv. store the plurality of modified pre-segmented texts and the set of semantic terms in the content database. 3. The system of claim 2 wherein the statistical ranking model is adapted to select a sequence of templates that best fit input data, the statistical ranking model being learned by application of historical corpus data for a given domain including conceptual-unit creation and collecting statistics. 4. A professional services resource system for processing documents and delivering hybrid statistical/template based natural language generation services, the system comprising: a. a processor module comprising one or more processors; b. a memory coupled to the processor module; c. a content database comprising a set of documents, wherein each document comprises a set of pre-segmented texts; d. a natural language text generation module executable by the processor module and configured to: i. identify, by a ranking module, a first statistically generated template matching a set of domain tags, the first statistically generated template being stored in a memory; ii. identify, by the ranking module, a second statistically generated template matching the set of domain tags, the second statistically generated template being stored in the memory; iii. select, by the ranking module, the first statistically generated template instead of the second statistically generated template based on ranking the first statistically generated template higher than the second statistically generated template using an automatically generated statistical ranking model, the automatically generated statistical ranking model being derived from a set of automatically generated model weights and a set of automatically generated ranking features; iv. generate, by the ranking module, a set of natural language text by inserting a set of information associated with a record into the first statistically generated template, the set of natural language text being stored in memory; and v. provide, by a delivery module, the set of natural language text. 5. The system of claim 4 wherein the natural language text generation module is further configured to, prior to the identifying by the ranking module: i. receive, by a template module, from the content database a corpus comprising a set of pre-segmented texts; ii. create, by the template module, a plurality of modified pre-segmented texts by: 1. extraction, by the template module, of a set of semantic terms for each pre-segmented text within the set of pre-segmented texts; and 2. application, by the template module, of at least one domain tag for each pre-segmented text within the set of pre-segmented texts; iii. cluster, by the template module, the plurality of modified pre-segmented texts into one or more conceptual units, wherein each of the one or more conceptual units is associated with one or more templates, wherein each of the one or more templates corresponds to one of the set of pre-segmented texts, and wherein a conceptual unit identifier is assigned to each modified pre-segmented text and pre-segmented text in the plurality of modified pre-segmented texts and set of pre-segmented texts respectively; and iv. store the plurality of modified pre-segmented texts and the set of semantic terms in the content database. 6. The system of claim 5 wherein the domain tags include domain specific tags and domain general tags. 7. The system of claim 4 wherein the natural language text generation module is further configured to, prior to the identifying by the ranking module: i. determine a gold template; ii. identify a set of matching templates from one or more templates within a given conceptual unit, the set of matching templates associated with the gold template; iii. rank the set of matching templates and associating a ranked set of matching templates with the gold template; and iv. determine the set of automatically generated model weights based on one or more ranked sets of matching templates and set of automatically generated ranking features; and v. store the plurality of modified pre-segmented texts, the set of automatically generated model weights, and the set of semantic terms in a content database. 8. The system of claim 7 wherein the set of automatically generated ranking features includes at least one of: a. a position of each pre-segmented text; b. a type and a number of a set of content; c. an n-gram calculation; d. a template length; and e. an overlap calculation between a current template and the gold template. 9. The system of claim 7 further comprising a training module adapted to calculate the set of automatically generated ranking features for each matching template in the ranked set of matching templates. 10. The system of claim 7 wherein the natural language text generation module determines a plurality of gold templates and a plurality of ranked sets of matching templates, and wherein a set of ranking features is calculated for each ranked set of matching templates. 11. The system of claim 10 wherein the set of ranking features comprises one or more of: (1) position in text as a proportion of the total text; (2) type and number of domain tags; (3) n-grams; (4) template
Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title
Natural language generation · CPC title
Semantic analysis · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.