Natural language generation
US-11847424-B1 · Dec 19, 2023 · US
US12411857B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411857-B2 |
| Application number | US-202318164992-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2023 |
| Priority date | Feb 6, 2023 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention provide an approach for exploring interesting data patterns in structured tables through recommending aggregate questions in a conversational data exploration. Specially, interesting features and operators are selected that are used to frame aggregate questions based on user intent and the data. The aggregate questions are ranked based on user persona and interestingness of the questions. The approach dynamically adapts and improves the recommendation of interesting and relevant aggregate questions for the user based on user feedback iteratively.
Opening claim text (preview).
The invention claimed is: 1. A method for retrieving data in a conversational data exploration, comprising: (a) calculating, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generating, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generating, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receiving a set of user selected questions from the set of natural language questions; (e) calculating, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) ranking, by the processor, the set of natural language questions based on each ranking score; (g) presenting to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receiving another set of user selected questions from the set of relevant questions; and (i) repeating (e) through (g) at least once. 2. The method of claim 1 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 3. The method of claim 1 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 4. The method of claim 1 , further comprising receiving, by the processor, a set of column headers from a user related to the dataset. 5. The method of claim 4 , further comprising presenting, by the processor, a subset of questions from the set of relevant questions based on the set of column headers. 6. The method of claim 1 , wherein the list of operators includes at least one of average, minimum, maximum, more than, less than, above, below, top K percent, fraction, total, majority, minority, missing, outlier, after, before, and within. 7. A computing system for retrieving data in a conversational data exploration, comprising: a processor; a memory device coupled to the processor; and a computer readable storage device coupled to the processor, wherein the storage device contains program code executable by the processor via the memory device to implement a method, the method comprising: (a) calculating, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generating, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generating, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receiving a set of user selected questions from the set of natural language questions; (e) calculating, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) ranking, by the processor, the set of natural language questions based on each ranking score; (g) presenting to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receiving another set of user selected questions from the set of relevant questions; and (i) repeating (e) through (g) at least once. 8. The computing system of claim 7 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 9. The computing system of claim 7 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 10. The computing system of claim 7 , further comprising receiving, by the processor, a set of column headers from a user related to the dataset. 11. The computing system of claim 10 , further comprising presenting, by the processor, a subset of questions from the set of relevant questions based on the set of column headers. 12. The computing system of claim 7 , wherein the list of operators includes at least one of average, minimum, maximum, more than, less than, above, below, top K percent, fraction, total, majority, minority, missing, outlier, after, before, and within. 13. A computer program product for retrieving data in a conversational data exploration, the computer program product comprising a computer readable storage device, and program instructions stored on the computer readable storage device, to: (a) calculate, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generate, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generate, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receive a set of user selected questions from the set of natural language questions; (e) calculate, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) rank, by the processor, the set of natural language questions based on each ranking score; (g) present to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receive another set of user selected questions from the set of relevant questions; and (i) repeat (e) through (g) at least once. 14. The computer program product of claim 13 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 15. The computer program product of claim 13 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 16. The computer program product of claim 13 , further comprising program instructions stored on the computer readable storage device to receive, by the processor, a set of column headers from a user related to the dataset. 17. The computer program product of claim 16 , further comprising program instructions stored on the computer readable
Column-oriented storage; Management thereof · CPC title
using ranking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.