Automatic formulation of data science problem statements

US11763084B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11763084-B2
Application numberUS-202016989882-A
CountryUS
Kind codeB2
Filing dateAug 10, 2020
Priority dateAug 10, 2020
Publication dateSep 19, 2023
Grant dateSep 19, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method comprises receiving a new data set; identifying at least one prior data set of a plurality of prior data sets that matches the new data set; generating a natural language data science problem statement for the new data set based on information associated with the at least prior one data set that matches the new data set; outputting the generated natural language data science problem statement for user verification; and in response to receiving user input verifying the natural language generated data science problem statement, generating one or more AutoAI configuration settings for the new data set based on one or more AutoAI configuration settings associated with the at least one prior data set that matches the new data set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a new data set; identifying at least one prior data set of a plurality of prior data sets that matches the new data set; generating a natural language data science problem statement for the new data set based on information associated with the at least one prior data set that matches the new data set, wherein the natural language data science problem statement poses a question that is grounded in the information associated with the at least one prior data set that matches the new data set; outputting the natural language data science problem statement to obtain user verification that the question posed by the natural language data science problem statement is applicable to the new data set; receiving user input that verifies the natural language data science problem statement; and generating one or more Automated Artificial Intelligence (AutoAI) configuration settings for the new data set based on one or more AutoAI configuration settings associated with the at least one prior data set that matches the new data set. 2. The method of claim 1 , wherein identifying the at least one prior data set is based on comparison of labels in the at least one prior data set and labels in the new data set. 3. The method of claim 1 , wherein identifying the at least one prior data set is based on comparison of data values in the at least one prior data set and data values in the new data set. 4. The method of claim 1 , wherein identifying the at least one prior data set comprises identifying a single prior data set with a closest match to the new data set. 5. The method of claim 1 , wherein identifying the at least one prior data set comprises identifying a plurality of prior data sets that match the new data set; wherein generating the natural language data science problem statement for the new data set comprises generating a plurality of natural language data science problem statements, each of the plurality of natural language data science problem statements corresponding to one of the plurality of prior data sets that match the new data set; and wherein outputting the natural language data science problem statement for user verification comprises outputting the plurality of natural language data science problem statements for user selection of one of the plurality of natural language data science problem statements. 6. The method of claim 1 , further comprising receiving user feedback regarding the natural language data science problem statement; and updating a machine learning algorithm used in generating the natural language data science problem statement based on the user feedback. 7. The method of claim 1 , wherein generating the natural language data science problem statement comprises generating the natural language data science problem statement based on labels, data values and the one or more AutoAI configuration settings for the at least one prior data set that matches the new data set. 8. A system comprising: an interface; a memory; and a processor communicatively coupled to the interface and to the memory, wherein the processor is configured to: receive a new data set via the interface; identify at least one prior data set of a plurality of prior data sets that matches the new data set; generate a natural language data science problem statement for the new data set based on information associated with the at least one prior data set that matches the new data set, wherein the natural language data science problem statement poses a question that is grounded in the information associated with the at least one prior data set that matches the new data set; output the natural language data science problem statement via the interface to obtain user verification that the question posed by the natural language data science problem statement is applicable to the new data set; receive user input verifying the natural language data science problem statement; and generate one or more Automated Artificial Intelligence (AutoAI) configuration settings for the new data set based on one or more AutoAI configuration settings associated with the at least one prior data set that matches the new data set. 9. The system of claim 8 , wherein the processor is configured to identify the at least one prior data set based on comparison of labels in the at least one prior data set and labels in the new data set. 10. The system of claim 8 , wherein the processor is configured to identify the at least one prior data set based on comparison of data values in the at least one prior data set and data values in the new data set. 11. The system of claim 8 , wherein the processor is configured to identify a single prior data set of with a closest match to the new data set. 12. The system of claim 8 , wherein the processor is configured to identify a plurality of prior data sets that match the new data set; wherein the processor is configured to generate a plurality of natural language data science problem statements, each of the plurality of natural language data science problem statements corresponding to one of the plurality of prior data sets that match the new data set; and wherein the processor is configured to output the plurality of natural language data science problem statements for user selection of one of the plurality of natural language data science problem statements. 13. The system of claim 8 , wherein the processor is further configured to: receive user feedback regarding the natural language data science problem statement; and update a machine learning algorithm used in generating the natural language data science problem statement based on the user feedback. 14. The system of claim 8 , wherein the processor is configured to generate the natural language data science problem statement based on labels, data values and the one or more AutoAI configuration settings for the at least one prior data set that matches the new data set. 15. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed by a processor, causes the processor to: receive a new data set; identify at least one prior data set of a plurality of prior data sets that matches the new data set; generate a natural language data science problem statement for the new data set based on information associated with the at least one prior data set that matches the new data set, wherein the natural language data science problem statement poses a question that is grounded in the information associated with the at least one prior data set that matches the new data set; output the natural language data science problem statement to obtain user verification that the question posed by the natural language data science problem statement is applicable to the new data set; receive user input verifying the natural language data science problem statement; and generate one or more Automated Artificial Intelligence (AutoAI) configuration settings for the new data set based on one or more AutoAI configuration settings associated with the at least one prior data set that matches the new data set. 16. The computer program product of claim 15 , wherein the computer readable program is further configured to cause the processor to identify the at least one prior data set based on comparison of labels in the at least one prior data set and labels in the new data set. 17. The computer program product of claim 15 , wherein the computer readable program is further conf

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G06F40/289Primary

    Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Natural language generation · CPC title

  • Recognition of textual entities · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11763084B2 cover?
A method comprises receiving a new data set; identifying at least one prior data set of a plurality of prior data sets that matches the new data set; generating a natural language data science problem statement for the new data set based on information associated with the at least prior one data set that matches the new data set; outputting the generated natural language data science problem st…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).