Automatic discovery of analysis scripts for a dataset

US10229171B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10229171-B2
Application numberUS-201614992851-A
CountryUS
Kind codeB2
Filing dateJan 11, 2016
Priority dateJan 11, 2016
Publication dateMar 12, 2019
Grant dateMar 12, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of automatic discovery of analysis scripts for a dataset, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, at a script searching tool, an input dataset; searching, in a script repository, a plurality of datasets having analysis scripts associated therewith; the searching comprising finding, based on a feature of the input dataset, one or more datasets of the plurality of datasets having the feature; identifying, based on the one or more datasets of the plurality of datasets having the feature, one or more associated analysis scripts; and returning, via a user interface, a result listing of the one or more associated analysis scripts. Other aspects are described and claimed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of automatic discovery of analysis scripts for a dataset, the method comprising: utilizing at least one processor to execute computer code that performs the steps of: receiving, at a script searching tool, an input dataset; extracting at least one feature from the input dataset; searching, in a script repository, a plurality of datasets having analysis scripts associated therewith, wherein each analysis script comprises program code that analyzes a dataset in a predetermined manner per the program code of the analysis script; said searching comprising finding, using the at least one feature of the input dataset to search the plurality of datasets, one or more datasets of the plurality of datasets having the at least one feature; identifying, based on the one or more datasets of the plurality of datasets having the feature, one or more associated analysis scripts, wherein each of the one or more associated analysis scripts comprise at least one analysis script corresponding to at least one of the one or more datasets having the feature; returning, via a user interface, a result listing of the one or more associated analysis scripts; and analyzing the input dataset using an analysis script selected by a user from the one or more associated analysis scripts within the result listing. 2. The method of claim 1 , comprising prioritizing results of the result listing. 3. The method of claim 2 , wherein the prioritizing comprises promoting an analysis script included in the result listing based on a factor selected from the group consisting of a user input, a user profile and a sample dataset. 4. The method of claim 3 , wherein the user profile comprises information selected from the group consisting of: a user search history and one or more datasets stored in a user client repository. 5. The method of claim 1 , wherein the extracted feature comprises a feature selected from the group consisting of a dataset column name, a dataset file name, a dataset file header, a dataset file structure, and a dataset type. 6. The method of claim 1 , wherein the feature of the input dataset comprises a relational feature. 7. The method of claim 6 , wherein the relational feature is selected from the group consisting of a file name pattern, a parallel dataset, and a forked repository. 8. The method of claim 1 , wherein the searching comprises employing a trained model to search based on the feature of the input dataset. 9. The method of claim 8 , comprising: receiving user feedback in response to the result listing; and updating the trained model based on the user feedback. 10. An apparatus for automatic discovery of analysis scripts for a dataset, the apparatus comprising: at least one processor; and a computer readable storage medium having computer readable program code embodied therewith and executable by the at least one processor to: receive, at a script searching tool, an input dataset; extract at least one feature from the input dataset; search, in a script repository, a plurality of datasets having analysis scripts associated therewith, wherein each analysis script comprises program code that analyzes a dataset in a predetermined manner per the program code of the analysis script; said searching comprising finding, using the at least one feature of the input dataset to search the plurality of datasets, one or more datasets of the plurality of datasets having the at least one feature; identify, based on the one or more datasets of the plurality of datasets having the feature, one or more associated analysis scripts, wherein each of the one or more associated analysis scripts comprise at least one analysis script corresponding to at least one of the one or more datasets having the feature; return, via a user interface, a result listing of the one or more associated analysis scripts; and analyzes the input dataset using an analysis script selected by a user from the one or more associated analysis scripts within the result listing. 11. A computer program product for automatic discovery of analysis scripts for a dataset, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith and executable by at least one processor to: receive, at a script searching tool, an input dataset; extract at least one feature from the input dataset; search, in a script repository, a plurality of datasets having analysis scripts associated therewith, wherein each analysis script comprises program code that analyzes a dataset in a predetermined manner per the program code of the analysis script; said searching comprising finding, using the at least one feature of the input dataset to search the plurality of datasets, one or more datasets of the plurality of datasets having the at least one feature; identify, based on the one or more datasets of the plurality of datasets having the feature, one or more associated analysis scripts, wherein each of the one or more associated analysis scripts comprise at least one analysis script corresponding to at least one of the one or more datasets having the feature; return, via a user interface, a result listing of the one or more associated analysis scripts; and analyzes the input dataset using an analysis script selected by a user from the one or more associated analysis scripts within the result listing. 12. The computer program product of claim 11 , further comprising prioritizing results of the result listing. 13. The computer program product of claim 12 , wherein the prioritizing comprises promoting an analysis script included in the result listing based on a factor selected from the group consisting of a user input, a user profile and a sample dataset. 14. The computer program product of claim 13 , wherein the user profile comprises information selected from the group consisting of: a user search history and one or more datasets stored in a user client repository. 15. The computer program product of claim 11 , wherein the extracted feature comprises a feature selected from the group consisting of a dataset column name, a dataset file name, a dataset file header, a dataset file structure, and a dataset type. 16. The computer program product of claim 11 , wherein the searching comprises employing a trained model to search based on the feature of the input dataset. 17. The computer program product of claim 12 , comprising: receiving user feedback in response to the result listing; and updating the trained model based on the user feedback. 18. A method of automatic discovery of analysis scripts for a dataset, the method comprising: utilizing at least one processor to execute computer code that performs the steps of: receiving, at a script searching tool, an input dataset; extracting at least one feature from the input dataset, said extracting comprising use of one or more features identified from a user specific dataset repository; searching, in a script repository, a plurality of datasets having analysis scripts associated therewith, wherein each analysis script comprises program code that analyzes a dataset in a predetermined manner per the program code of the analysis script; said searching comprising finding, using the extracted feature of the input dataset to search the plurality of datasets, one or more datasets of the plurality of datasets having the extracted feature; identifying, based on the one or more datasets of the plurality of datasets having the extracted feature, one or more associated analysis s

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Program loading or initiating (bootstrapping G06F9/4401; security arrangements for program loading or initiating G06F21/57) · CPC title

  • Physics · mapped topic

  • Software maintenance or management · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10229171B2 cover?
A method of automatic discovery of analysis scripts for a dataset, the method including: utilizing at least one processor to execute computer code that performs the steps of: receiving, at a script searching tool, an input dataset; searching, in a script repository, a plurality of datasets having analysis scripts associated therewith; the searching comprising finding, based on a feature of the …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F17/30554. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 12 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).