Automated data processing and machine learning model generation

US11640563B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11640563-B2
Application numberUS-202016827292-A
CountryUS
Kind codeB2
Filing dateMar 23, 2020
Priority dateAug 30, 2019
Publication dateMay 2, 2023
Grant dateMay 2, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may obtain first data relating to a machine learning model. The device may pre-process the first data to alter the first data to generate second data. The device may process the second data to select a set of features from the second data. The device may analyze the set of features to evaluate a plurality of types of machine learning models with respect to the set of features. The device may select a particular type of machine learning model for the set of features based on analyzing the set of features to evaluate the plurality of types of machine learning models. The device may tune a set of parameters of the particular type of machine learning model to train the machine learning model. The device may receive third data for prediction. The device may provide a prediction using the particular type of machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining, by a device, first data relating to a machine learning model; pre-processing, by the device, the first data to alter the first data to generate second data, wherein pre-processing the first data comprises: generating a clustering index value based on generating a set of clusters, wherein the clustering index value is associated with data purity and risk, identifying outlier data, of the first data, based on generating the clustering index value, and altering the first data to remove or alter the outlier data to generate the second data; processing, by the device, the second data to select a set of features from the second data; analyzing, by the device, the set of features to evaluate a plurality of types of machine learning models with respect to the set of features; selecting, by the device, a particular type of machine learning model, of the plurality of types of machine learning models, for the set of features based on analyzing the set of features to evaluate the plurality of types of machine learning models; tuning, by the device, a set of parameters of the particular type of machine learning model to train the machine learning model; providing, by the device, access to the particular type of machine learning model via an interface; receiving, by the device and as input via the interface, third data for prediction using the particular type of machine learning model; and providing, by the device and as output via the interface, a prediction using the particular type of machine learning model based on receiving the third data. 2. The method of claim 1 , wherein processing the second data comprises: identifying a plurality of features of the second data; and performing a feature reduction procedure to identify the set of features from the plurality of features of the second data. 3. The method of claim 1 , wherein analyzing the set of features to evaluate a plurality of types of machine learning models with respect to the set of features comprises: classifying the plurality of types of machine learning models based on a type of problem associated with the first data. 4. The method of claim 1 , wherein analyzing the set of features to evaluate a plurality of types of machine learning models with respect to the set of features comprises: automatically optimizing hyper parameters of the plurality of types of machine learning models to attempt to optimize the plurality of types of machine learning models. 5. The method of claim 1 , wherein analyzing the set of features to evaluate a plurality of types of machine learning models with respect to the set of features comprises: providing, via a user interface, a visualization of model performance of the plurality of types of machine learning models; and receiving, via the user interface, a selection of the particular type of machine learning model based on providing the visualization of the model performance of the plurality of types of machine learning models. 6. The method of claim 1 , wherein tuning the set of parameters of the particular type of machine learning model comprises: optimizing hyper parameters of the particular type of machine learning model; and retraining the machine learning model based on optimizing the hyper parameters of the machine learning model. 7. The method of claim 1 , wherein providing access to the particular type of machine learning model via the interface comprises: deploying the particular type of machine learning model as a webservice for a plurality of client instances; and notifying a plurality of external applications based on deploying the particular type of machine learning model. 8. A device, comprising: one or more memories; and one or more processors, communicatively coupled to the one or more memories, to: receive input data from a data source; pre-process and filter the input data to generate intermediate data based on receiving the input data, wherein the one or more processors, to pre-process the input data, are configured to: generate a clustering index value based on generating a set of clusters,  wherein the clustering index value is associated with data purity and risk, identify outlier data, of the input data, based on generating the clustering index value, and alter the input data to remove or alter the outlier data to generate the intermediate data; label one or more missing labels in the intermediate data to generate output data based on generating the intermediate data; select, for the output data, a machine learning model, of a plurality of types of machine learning models, to apply to the output data; tune a set of hyper-parameters for the machine learning model based on selecting the machine learning model; establish a model pipeline for the machine learning model based on tuning the set of hyper-parameters; receive prediction data based on establishing the model pipeline; perform a prediction using the machine learning model and using the prediction data; and provide the prediction for display via a user interface based on performing the prediction. 9. The device of claim 8 , wherein the input data is associated with a plurality of languages; and wherein the one or more processors are further to: identify the plurality of languages; and train, for the machine learning model, a plurality of sub-models for the plurality of languages. 10. The device of claim 8 , wherein the machine learning model is a natural language processing model or a computer vision model. 11. The device of claim 8 , wherein the prediction data is a first subset of a text entry and the prediction is a second subset of the text entry. 12. The device of claim 8 , wherein the one or more processors are further to: perform a cluster analysis on the input data, wherein the cluster analysis is associated with a data purity or a data risk associated with the input data; provide, via the user interface, a result of the cluster analysis; and determine a modification to the input data based on providing the result of the cluster analysis via the user interface. 13. The device of claim 12 , wherein the cluster analysis identifies a plurality of predicted sources of error in the input data. 14. The device of claim 8 , wherein the one or more processors, when pre-processing and filtering the input data, are configured to perform at least one of: a space-trimming procedure, a case lowering procedure, a stop words removal procedure, a Boolean logic filtering procedure, or a regular expression pattern matching procedure. 15. The device of claim 8 , wherein the plurality of types of machine learning models includes at least one of: a random forest classifier model, a k-nearest neighbor model, a decision tree model, a multilayer perceptron model, a stochastic gradient descent classifier model, a logistic regression model, a linear support vector classifier model, a naïve Bayes model, a ridge regression model, a convolutional neural network model, or an Inception v3 model. 16. A non-transitory computer-readable medium storing instructions, the instructions comprising: one or more instructions that, when executed by one or more processors, cause the one or more processors to: obtain first data relating to a natural language processing or image processing task; pre-process and filter the first data to generate second data, wherein the one or more instructions, that cause the one or more processors to pre-process the first data, cause the one or more processors to: genera

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11640563B2 cover?
A device may obtain first data relating to a machine learning model. The device may pre-process the first data to alter the first data to generate second data. The device may process the second data to select a set of features from the second data. The device may analyze the set of features to evaluate a plurality of types of machine learning models with respect to the set of features. The devi…
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 02 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).