Subject clustering method and apparatus
US-2020219627-A1 · Jul 9, 2020 · US
US11423333B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11423333-B2 |
| Application number | US-202016829055-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 25, 2020 |
| Priority date | Mar 25, 2020 |
| Publication date | Aug 23, 2022 |
| Grant date | Aug 23, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Mechanisms are provided for optimizing an automated machine learning (AutoML) operation to configure parameters of a machine learning model. AutoML logic is configured based on an initial default value and initial range for sampling of a parameter of the machine learning (ML) model and an initial AutoML process is executed on the ML model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic. For each domain, a cross-dataset default value and cross-dataset value range are derived from results of the execution of the initial AutoML process. For each domain, an entry is stored in a data structure, the entry storing the derived cross-dataset default value and cross-dataset value range for the domain. The AutoML logic performs a subsequent AutoML process on a new dataset based on one or more entries of the data structure.
Opening claim text (preview).
What is claimed is: 1. A method for optimizing performance of an automated machine learning (AutoML) operation to configure parameters of a machine learning model, the method comprising: configuring AutoML logic based on an initial default value and initial range for parameter sampling of a parameter of the machine learning model; executing an initial AutoML process on the machine learning model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic; generating, for each domain in the plurality of domains, a derived cross-dataset default value and derived cross-dataset value range derived from results of the execution of the initial AutoML process; storing, for each domain in the plurality of domains, an entry of a data structure comprising the derived cross-dataset default value and cross-dataset value range for the domain; and performing, by the AutoML logic, a subsequent AutoML process on a new dataset based on one or more entries of the data structure. 2. The method of claim 1 , further comprising: receiving a plurality of labeled training datasets, wherein, for each labeled training dataset, labels of the training dataset indicate, for associated portions of data in the labeled training dataset, a corresponding domain; and performing machine learning training of a domain classifier machine learning model based on the labeled training datasets to generate a trained domain classifier, wherein the trained domain classifier performs the domain classification operation on data elements of the new dataset. 3. The method of claim 2 , wherein one or more of the labeled training datasets comprises a mixed domain labeled training dataset having at least two portions of data associated with at least two different domains. 4. The method of claim 1 , wherein performing the subsequent AutoML process on the new dataset comprises: performing a domain classification operation on data elements of the new dataset to identify which domains in the plurality of domains are represented in the new dataset; retrieving, from the data structure, entries corresponding to the domains represented in the new dataset; and configuring the AutoML logic based on the retrieved entries corresponding to the domains represented in the new dataset. 5. The method of claim 1 , wherein configuring the AutoML logic based on the retrieved entries corresponding on the domains represented in the new dataset comprises: determining a cross-domain default value for the parameter based on a first statistical function of the cross-dataset default values for the domains represented in the new dataset; determining a cross-domain value range for the parameter based on a second statistical function of the cross-dataset value ranges for the domains represented in the new dataset; and configuring parameter sampling logic of the AutoML logic with the cross-domain default value and cross-domain value range. 6. The method of claim 5 , wherein the first statistical function is a weighted mean of the cross-dataset default values for the domains represented in the new dataset, wherein the weights in the first statistical function are determined based on a relative representation of the domain in the new dataset, wherein the second statistical function is a weighted mean of the cross-dataset value ranges, wherein the weights in the second statistical function are determined based on a relative representation of the domain in the new dataset. 7. The method of claim 4 , wherein configuring the AutoML logic based on the retrieved entries comprises: performing a cross-domain analysis of the derived cross-dataset default value and derived cross-dataset value range for the domains represented in the new dataset, to generate updated parameter sampling configuration data; and updating a configuration of the AutoML logic to utilize the updated parameter sampling configuration data to perform the subsequent AutoML process. 8. The method of claim 1 , wherein generating the derived cross-dataset default value and derived cross-dataset value range comprises, for each domain in the plurality of domains: determining the derived cross-dataset default value as a first statistical function of a plurality of learned values for the parameter, for the domain, generated by the AutoML logic during the initial AutoML process; and determining the derived cross-dataset value range comprises determining a lower bound as a second statistical function of the derived cross-dataset default value, and determining an upper bound as a third statistical function of the derived cross-dataset default value. 9. The method of claim 8 , wherein the first statistical function is a weighted mean of the plurality of learned values, wherein the weights in the weighted mean are determined based on a relative representation of the domain in a corresponding dataset, wherein the second statistical function is a difference of the derived cross-dataset default value and one or more standard deviations of the plurality of learned values, and wherein the third statistical function is a sum of the derived cross-dataset default value and one or more standard deviations of the plurality of learned values. 10. The method of claim 1 , wherein the parameter is a hyperparameter of the machine learning model. 11. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: configure automated machine learning (AutoML) logic of the computing device based on an initial default value and initial range for parameter sampling of a parameter of a machine learning model; execute an initial AutoML process on the machine learning model based on a plurality of datasets comprising a plurality of domains of data elements, utilizing the initially configured AutoML logic; generate, for each domain in the plurality of domains, a derived cross-dataset default value and derived cross-dataset value range derived from results of the execution of the initial AutoML process; store, for each domain in the plurality of domains, an entry of a data structure comprising the derived cross-dataset default value and cross-dataset value range for the domain; and perform, by the AutoML logic, a subsequent AutoML process on a new dataset based on one or more entries of the data structure. 12. The computer program product of claim 11 , wherein the computer readable program further causes the computing device to: receive a plurality of labeled training datasets, wherein, for each labeled training dataset, labels of the training dataset indicate, for associated portions of data in the labeled training dataset, a corresponding domain; and perform machine learning training of a domain classifier machine learning model based on the labeled training datasets to generate a trained domain classifier, wherein the trained domain classifier performs the domain classification operation on data elements of the new dataset. 13. The computer program product of claim 12 , wherein one or more of the labeled training datasets comprises a mixed domain labeled training dataset having at least two portions of data associated with at least two different domains. 14. The computer program product of claim 11 , wherein the computer readable program further causes the computing device to perform the subsequent AutoML process on the new dataset at least by: performing a domain classification operation on data elements of the new dataset to iden
for solving equations {, e.g. nonlinear equations, general mathematical optimization problems (optimization specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Machine learning · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Backpropagation, e.g. using gradient descent · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.