Pre-processing for data-driven model creation

US11687786B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11687786-B2
Application numberUS-202017002142-A
CountryUS
Kind codeB2
Filing dateAug 25, 2020
Priority dateApr 28, 2017
Publication dateJun 27, 2023
Grant dateJun 27, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving input that identifies one or more data sources and determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine. The method also includes generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources. The method further includes, based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, at a processor of a computing device, input that identifies one or more data sources; determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine; generating an input data set for the AMB engine including extracting first data values from the one or more data sources and modifying the first data values to generate second data values based on application of one or more rules, wherein the second data values include at least one value that is not present in the first data values; and based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 2. The method of claim 1 , wherein the input further comprises an indication of a constraint for generation of the neural network by the AMB engine. 3. The method of claim 2 , wherein modifying the first data values to generate the second data values includes scaling the first data values. 4. The method of claim 1 , further comprising generating a data profile including information regarding data fields of the one or more data sources, wherein the input data set is generated based on the data profile. 5. The method of claim 4 , wherein generating the input data set further comprises omitting, from the input data set, one or more columns of data of the one or more data sources based on the data profile. 6. The method of claim 4 , wherein the data profile of a particular column indicates a type of data stored in the particular column. 7. The method of claim 4 , wherein the data profile of a particular column indicates a statistical metric descriptive of data stored in the particular column. 8. A computer system comprising: an automated model building (AMB) pre-processor configured to: receive input that identifies one or more data sources; determine a machine learning problem type of a plurality of machine learning problem types supported by an AMB engine; generate an input data set for the AMB engine including extracting first data values from the one or more data sources and modifying the first data values to generate second data values based on application of one or more rules, wherein the second data values include at least one value that is not present in the first data values; and based on the input data set and the machine learning problem type, initiate execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 9. The computer system of claim 8 , wherein the AMB pre-processor comprises a data source analyzer configured to determine a combined data source based on the one or more data sources, wherein the combined data source is used to generate the input data set. 10. The computer system of claim 8 , wherein the AMB pre-processor comprises a data profiler configured to determine a data profile including information regarding data fields of the one or more data sources, wherein the input data set is generated based on the data profile. 11. The computer system of claim 8 , wherein the AMB pre-processor is further configured to determine, based on the machine learning problem type, an error function to be used by the AMB engine to evaluate one or more neural networks while generating the neural network configured to model at least the portion of the input data set. 12. The computer system of claim 8 , wherein the AMB pre-processor is further configured to determine data sampling criteria used to generate the input data set to compensate for data imbalances in the one or more data sources. 13. The computer system of claim 8 , wherein the AMB engine comprises a first device configured to execute a genetic algorithm. 14. The computer system of claim 13 , wherein the AMB engine comprises a second device configured to execute an optimizer. 15. The computer system of claim 8 , wherein the input further identifies one or more data fields of the input data set to be modeled by the AMB engine, and wherein the AMB pre-processor is configured to determine the machine learning problem type based at least in part on the one or more data fields of the input data set to be modeled. 16. The computer system of claim 8 , wherein the input further identifies one or more parameters to control operation of the AMB engine, and wherein the AMB pre-processor is configured to initiate execution of the AMB engine based on the one or more parameters. 17. The computer system of claim 16 , wherein the one or more parameters indicate one or more termination criteria for operation of the AMB engine, topology constraints for neural networks generated by the AMB engine, or both. 18. A computer-readable storage device storing instructions that, when executed, cause a computer to perform operations comprising: receiving input that identifies one or more data sources; determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine; generating an input data set for the AMB engine including extracting first data values from the one or more data sources and modifying the first data values to generate second data values based on application of one or more rules, wherein the second data values include at least one value that is not present in the first data values; and based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 19. The computer-readable storage device of claim 18 , wherein the operations include generating output to a display device, the output including one or more graphical user interfaces (GUIs) configured to receive the input. 20. The computer-readable storage device of claim 19 , wherein one or more GUIs are configured to receive additional input indicating the machine learning problem type, one or more termination criteria for operation of the AMB engine, topology constraints for neural networks generated by the AMB engine, or a combination thereof.

Assignees

Inventors

Classifications

  • Reinforcement learning · CPC title

  • Supervised learning · CPC title

  • Feedforward networks · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11687786B2 cover?
A method includes receiving input that identifies one or more data sources and determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine. The method also includes generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources. Th…
Who is the assignee on this patent?
Sparkcognition Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 27 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).