What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Hyper-parameter management

US12093814B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12093814-B2
Application number	US-201916544969-A
Country	US
Kind code	B2
Filing date	Aug 20, 2019
Priority date	Aug 20, 2019
Publication date	Sep 17, 2024
Grant date	Sep 17, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, system, and computer program product for hyper-parameter determination. In a method, a network architecture of a learning model may be determined, and the learning model may be configured for performing a computing task based on machine learning. A metric value record associated with a group of hyper-parameters may be obtained during hyper-parameter determination for the learning model. An estimation of a metric value may be obtained based on the network architecture, and the metric value record and an association relationship representing an association between network architectures and metric values for the network architectures. The group of hyper-parameters may be selected in response to the estimation of the metric value meeting a predefined criterion. With these embodiments, a group of hyper-parameters may be selected, and further the learning model may be trained based on the selected group of hyper-parameters.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: determining, by one or more processors, a network architecture of a learning model, the learning model being configured for performing a computing task based on machine learning, wherein the network architecture is represented as a vector comprising data corresponding to each layer in the learning model, wherein a data structure associated with each layer in the machine learning model comprises a type of each node from among a plurality of nodes and a connection relationship associated with each node, wherein the type corresponds to a neural network cell or an activation function, and wherein the computing task includes at least one of image classification, face recognition, and text processing performed by the learning model, and wherein the network architecture is determined based on the connection relationship and the plurality of nodes by building a directed acyclic graph represented by a matrix and a value at a location in the matrix corresponding to whether two nodes from among the plurality of nodes are connected; obtaining, by the one or more processors, a metric value record associated with a group of hyper-parameters during hyper-parameter determination for the learning model; obtaining, by the one or more processors, an estimation of a metric value based on the network architecture, the metric value record and an association relationship representing an association between network architectures and metric values for the network architecture; and selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting a predefined criterion, wherein the predefined criterion corresponds to a comparison among multiple estimations associated with multiple groups of hyper-parameters; and training, by the one or more processors, the association relationship based on determining a number of iterations corresponding to a convergence associated with the selected group of hyper-parameters and obtaining metric value records for more iterations and less iterations than the number of iterations corresponding to the convergence, wherein the method further comprises: with respect to a sample learning model in a plurality of sample learning models, determining, by the one or more processors, a sample network architecture of the sample learning model, the plurality of sample learning models being configured for performing a plurality of sample tasks based on the machine learning, respectively; obtaining, by the one or more processors, a plurality of metric value records during a plurality of experiments for the hyper-parameter determination; and training, by the one or more processors, the association relationship based on the sample network architecture and the plurality of metric value records, such that the trained association relationship represents an association between the sample network architecture and the plurality of metric value records, wherein the plurality of metric value records corresponds to various percentages of the iterations relative to the convergence such that the association relationship has knowledge of various time points during hyper-parameters determination. 2. The method of claim 1 , wherein the determining, by the one or more processors, the network architecture of the learning model comprises: extracting, by the one or more processors, the connection relationship among the plurality of nodes comprised in the learning model; and determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of nodes. 3. The method of claim 2 , wherein the determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of nodes comprises: determining, by the one or more processors, a plurality of layers formed by the plurality of nodes; and determining, by the one or more processors, the network architecture based on the connection relationship and the plurality of layers. 4. The method of claim 1 , further comprising: obtaining, by the one or more processors, a further metric value record associated with a further group of hyper-parameters during the hyper-parameter determination for the learning model; obtaining, by the one or more processors, a further estimation of a metric value based on the network architecture, the further metric value record and the association relationship; and wherein the selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value meeting the predefined criterion comprises: selecting, by the one or more processors, the group of hyper-parameters in response to the estimation of the metric value being closer to the convergence during the hyper-parameter determination than the further estimation. 5. The method of claim 4 , wherein the estimation of the metric value comprises an extreme value among a plurality of metric values associated with a plurality of group of hyper-parameters during the hyper-parameter determination. 6. The method of claim 1 , wherein the obtaining, by the one or more processors, the plurality of metric value records comprises: obtaining, by the one or more processors, one of the plurality of the metric value records based on metric values associated with a progress of the hyper-parameter determination. 7. The method of claim 6 , wherein the obtaining, by the one or more processors, one of the plurality of the metric value records comprises: determining, by the one or more processors, the convergence during the hyper-parameter determination; and obtaining, by the one or more processors, a metric value record based on the determined convergence. 8. The method of claim 1 , further comprising: obtaining, by the one or more processors, a group of sample data for training the learning model; and training, by the one or more processors, the learning model based on the group of sample data and the selected group of hyper-parameters. 9. The method of claim 8 , further comprising: obtaining, by the one or more processors, an object that is to be processed by the computing task; and processing, by the one or more processors, the object based on the trained learning model. 10. A computer-implemented system, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage media, and program instructions stored on at least one of the one or more computer-readable tangible storage media for execution by at least one of the one or more processors via at least one of the one or more computer-readable memories, wherein the computer system is capable of performing a method comprising: determining a network architecture of a learning model, the learning model being configured for performing a computing task based on machine learning, wherein the network architecture is represented as a vector comprising data corresponding to each layer in the learning model, wherein a data structure associated with each layer in the machine learning model comprises a type of each node from among a plurality of nodes and a connection relationship associated with each node, wherein the type corresponds to a neural network cell or an activation function, wherein the computing task includes at least one of image classification, face recognition, and text processing performed by the learning model, and wherein the network architecture is determined based on the connection relationship and the plurality of nodes by building a directed acyclic graph represented by a matrix and a value at a location in the matrix correspon

Assignees

Inventors

Classifications

G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06V10/82
using neural networks · CPC title
G06V10/776
Validation; Performance evaluation · CPC title

Patent family

Related publications grouped by family.

View patent family 74645843

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12093814B2 cover?: A method, system, and computer program product for hyper-parameter determination. In a method, a network architecture of a learning model may be determined, and the learning model may be configured for performing a computing task based on machine learning. A metric value record associated with a group of hyper-parameters may be obtained during hyper-parameter determination for the learning mode…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).