Recommending aggregate questions in a conversational data exploration

US12411857B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12411857-B2
Application numberUS-202318164992-A
CountryUS
Kind codeB2
Filing dateFeb 6, 2023
Priority dateFeb 6, 2023
Publication dateSep 9, 2025
Grant dateSep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide an approach for exploring interesting data patterns in structured tables through recommending aggregate questions in a conversational data exploration. Specially, interesting features and operators are selected that are used to frame aggregate questions based on user intent and the data. The aggregate questions are ranked based on user persona and interestingness of the questions. The approach dynamically adapts and improves the recommendation of interesting and relevant aggregate questions for the user based on user feedback iteratively.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for retrieving data in a conversational data exploration, comprising: (a) calculating, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generating, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generating, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receiving a set of user selected questions from the set of natural language questions; (e) calculating, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) ranking, by the processor, the set of natural language questions based on each ranking score; (g) presenting to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receiving another set of user selected questions from the set of relevant questions; and (i) repeating (e) through (g) at least once. 2. The method of claim 1 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 3. The method of claim 1 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 4. The method of claim 1 , further comprising receiving, by the processor, a set of column headers from a user related to the dataset. 5. The method of claim 4 , further comprising presenting, by the processor, a subset of questions from the set of relevant questions based on the set of column headers. 6. The method of claim 1 , wherein the list of operators includes at least one of average, minimum, maximum, more than, less than, above, below, top K percent, fraction, total, majority, minority, missing, outlier, after, before, and within. 7. A computing system for retrieving data in a conversational data exploration, comprising: a processor; a memory device coupled to the processor; and a computer readable storage device coupled to the processor, wherein the storage device contains program code executable by the processor via the memory device to implement a method, the method comprising: (a) calculating, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generating, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generating, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receiving a set of user selected questions from the set of natural language questions; (e) calculating, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) ranking, by the processor, the set of natural language questions based on each ranking score; (g) presenting to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receiving another set of user selected questions from the set of relevant questions; and (i) repeating (e) through (g) at least once. 8. The computing system of claim 7 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 9. The computing system of claim 7 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 10. The computing system of claim 7 , further comprising receiving, by the processor, a set of column headers from a user related to the dataset. 11. The computing system of claim 10 , further comprising presenting, by the processor, a subset of questions from the set of relevant questions based on the set of column headers. 12. The computing system of claim 7 , wherein the list of operators includes at least one of average, minimum, maximum, more than, less than, above, below, top K percent, fraction, total, majority, minority, missing, outlier, after, before, and within. 13. A computer program product for retrieving data in a conversational data exploration, the computer program product comprising a computer readable storage device, and program instructions stored on the computer readable storage device, to: (a) calculate, by a processor, an importance score for each column within a dataset; selecting, by the processor, a set of the columns based on the importance score of each column; (b) generate, by the processor, a list of operators, wherein generating the list of operators includes: determining Fisher scores for each column, and designating columns having a Fisher score above a threshold as part of a subset of data, for the subset of data, calculating entropy values for a plurality of data slices of the subset of data, and generating, for each column within the set of columns, the list of operators based on the Fisher scores and the calculated entropy values; (c) generate, by the processor, a set of natural language questions based on the set of columns, the list of operators, and metadata related to the dataset; (d) receive a set of user selected questions from the set of natural language questions; (e) calculate, by the processor, a ranking score for each question within the set of natural language questions using an entropy-based scoring method and based on the set of user selected questions; (f) rank, by the processor, the set of natural language questions based on each ranking score; (g) present to the user, by the processor, a set of relevant questions from the set of natural language questions based on a predefined threshold; (h) receive another set of user selected questions from the set of relevant questions; and (i) repeat (e) through (g) at least once. 14. The computer program product of claim 13 , wherein the importance score of each column is calculated based on a role, responsibility, or intent of a user. 15. The computer program product of claim 13 , wherein the metadata includes a set of column headers, a description, and a title related to the dataset. 16. The computer program product of claim 13 , further comprising program instructions stored on the computer readable storage device to receive, by the processor, a set of column headers from a user related to the dataset. 17. The computer program product of claim 16 , further comprising program instructions stored on the computer readable

Assignees

Inventors

Classifications

  • Column-oriented storage; Management thereof · CPC title

  • using ranking · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12411857B2 cover?
Embodiments of the present invention provide an approach for exploring interesting data patterns in structured tables through recommending aggregate questions in a conversational data exploration. Specially, interesting features and operators are selected that are used to frame aggregate questions based on user intent and the data. The aggregate questions are ranked based on user persona and in…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/24578. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).