What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Pre-processing for data-driven model creation

US10963790B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10963790-B2
Application number	US-201715582496-A
Country	US
Kind code	B2
Filing date	Apr 28, 2017
Priority date	Apr 28, 2017
Publication date	Mar 30, 2021
Grant date	Mar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes receiving input that identifies one or more data sources and determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine. The method also includes generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources. The method further includes, based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, at a processor of a computing device, input that identifies one or more data sources; determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine; generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources, wherein the one or more rules indicate that a column is to be dropped responsive to determining that: the column has zero standard deviation; the column includes a unique value in at least a first threshold percentage of rows; the column has at least a second threshold percentage of missing or corrupted values; the column represents categorical data and includes more than a threshold number of unique values; or any combination thereof; and based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 2. The method of claim 1 , wherein determining the machine learning problem type comprises at least one of: determining a classification problem type responsive to receiving second input to predict a categorical column; determining a regression problem type responsive to receiving third input to predict a numerical column; or determining a reinforcement learning problem type responsive to receiving fourth input indicating at least of a state data structure, an action data structure, a reward function, or an interaction function. 3. A computer system comprising: an automated model building (AMB) pre-processor configured to: receive input that identifies one or more data sources; determine, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an AMB engine; generate an input data set of the AMB engine based on application of one or more rules to the one or more data sources, wherein the one or more rules indicate that a column is to be dropped responsive to determining that: the column has zero standard deviation; the column includes a unique value in at least a first threshold percentage of rows; the column has at least a second threshold percentage of missing or corrupted values; the column represents categorical data and includes more than a threshold number of unique values; or any combination thereof; and based on the input data set and the machine learning problem type, initiate execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 4. The computer system of claim 3 , wherein the AMB pre-processor comprises a data source analyzer configured to determine a combined data source based on the one or more data sources. 5. The computer system of claim 4 , wherein the data source analyzer is configured to determine whether a particular data source includes column headers. 6. The computer system of claim 3 , wherein the AMB pre-processor comprises a data profiler configured to determine at least one of a data profile, an input profile, or a target profile. 7. The computer system of claim 6 , wherein the data profiler is further configured to perform at least one of a data cleaning operation or a data scaling operation. 8. The computer system of claim 7 , wherein performing the data cleaning operation includes performing an imputation operation to determine at least one missing data value of at least one data source. 9. The computer system of claim 3 , wherein the one or more rules indicate that a column including fewer than a threshold number of unique values corresponds to categorical data. 10. The computer system of claim 3 , wherein the AMB pre-processor is further configured to determine an error function based on the machine learning problem type, determine data sampling criteria used to generate the input data set of the AMB engine, or both. 11. The computer system of claim 3 , wherein the AMB engine comprises a first device configured to execute a genetic algorithm. 12. The computer system of claim 11 , wherein the AMB engine comprises a second device configured to execute an optimizer. 13. The computer system of claim 3 , further comprising an output interface configured to send one or more graphical user interfaces (GUIs) to a display device. 14. The computer system of claim 13 , wherein the one or more GUIs are configured to receive the input identifying the one or more data sources, second input indicating the machine learning problem type, third input indicating a training time threshold, or any combination thereof. 15. The computer system of claim 13 , wherein the one or more GUIs are configured to receive fourth input identifying a target column to be predicted by the neural network. 16. The computer system of claim 13 , wherein the one or more GUIs are configured to receive fifth input identifying a failure prediction lead time, sixth input indicating at least one failure, or both. 17. The computer system of claim 13 , wherein the one or more GUIs are configured to receive seventh input indicating a reinforcement learning data structure, eight input indicating a reinforcement learning reward function, or both. 18. A computer-readable storage device storing instructions that, when executed, cause a computer to perform operations comprising: receiving input that identifies one or more data sources; determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine; generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources, wherein the one or more rules indicate that a column is to be dropped responsive to determining that: the column has zero standard deviation; the column includes a unique value in at least a first threshold percentage of rows; the column has at least a second threshold percentage of missing or corrupted values; the column represents categorical data and includes more than a threshold number of unique values; or any combination thereof; and based on the input data set and the machine learning problem type, initiating execution of the AMB engine to generate a neural network configured to model at least a portion of the input data set. 19. The computer-readable storage device of claim 18 , wherein: the operations include replacing a categorical column with a plurality of input columns in accordance with a one-hot encoding scheme; a classification output of the neural network is based on a softmax of a plurality of output nodes of the neural network; or both.

Assignees

Sparkcognition Inc

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/092
Reinforcement learning · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0499
Feedforward networks · CPC title
G06N3/082
modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

Patent family

Related publications grouped by family.

View patent family 63917397

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10963790B2 cover?: A method includes receiving input that identifies one or more data sources and determining, based on the input, a machine learning problem type of a plurality of machine learning problem types supported by an automated model building (AMB) engine. The method also includes generating an input data set of the AMB engine based on application of one or more rules to the one or more data sources. Th…
Who is the assignee on this patent?: Sparkcognition Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).