Method and apparatus for searching for neural network ensemble model, and electronic device

US2024311651A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024311651-A1
Application numberUS-202418668637-A
CountryUS
Kind codeA1
Filing dateMay 20, 2024
Priority dateNov 22, 2021
Publication dateSep 19, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method for searching for a neural network architecture ensemble model. The method includes: obtaining a dataset, where the dataset includes a sample and an annotation in a classification task; performing search by using a distributional neural network architecture search algorithm, including: determining a hyperparameter of a neural network architecture distribution; sampling a valid neural network architecture from the architecture distribution defined by the hyperparameter; training and evaluating the neural network architecture on the dataset, to obtain a performance indicator; determining, based on the performance indicator, neural network architecture distributions that share the hyperparameter, to obtain a candidate pool of base learners; and determining a surrogate model; and predicting test performance of the base learner in the candidate pool by using the surrogate model, and determining that k diverse base learners that meet a task scenario requirement form an ensemble model.

First claim

Opening claim text (preview).

1 . A method for searching for a neural network architecture ensemble model, wherein the method comprises: obtaining a dataset, wherein the dataset comprises a sample and an annotation in a classification task; performing search by using a distributional neural network architecture search algorithm, comprising: determining a hyperparameter of a neural network architecture distribution; sampling a neural network architecture from the architecture distribution defined by the hyperparameter; training and evaluating the neural network architecture, based on the sample and the annotation in the classification task, to obtain a performance indicator; determining, based on the performance indicator, predicted neural network architecture distributions that share the hyperparameter; to obtain a candidate pool of base learners, wherein a base learner is a neural network architecture that meets an architecture distribution requirement, and the neural network architecture is formed by repeatedly stacking neural network architecture cells; and determining a surrogate model, wherein the surrogate model is used to predict test performance of an unevaluated neural network architecture; and predicting test performance of a base learner in the candidate pool by using the surrogate model, and determining that k base learners that meet a requirement of the classification task form an ensemble model, wherein a size of the ensemble model is k. 2 . The method of claim 1 , wherein the performing search by using a distributional neural network architecture search algorithm further comprises: performing distributional neural network architecture search by using an approximate neural network architecture search via operation distribution (ANASOD) algorithm. 3 . The method of claim 1 , wherein the determining a hyperparameter of a neural network architecture distribution comprises: determining that the hyperparameter of the neural network architecture distribution is an ANASOD encoding, wherein the ANASOD encoding is a vector indicating probability distributions of operators in a neural network architecture cell, and there is a one-to-many mapping between an ANASOD encoding and the neural network architecture cell. 4 . The method of claim 1 , wherein the determining a hyperparameter of a neural network architecture distribution comprises: optimizing the hyperparameter of the neural network architecture distribution by using a search policy, wherein the search policy is Bayesian optimization, and the search policy is used to sample, in a next iteration, a neural network cell whose performance indicator better meets a requirement than that of a current neural network architecture cell. 5 . The method of claim 3 , wherein the sampling a neural network architecture from the architecture distribution defined by the hyperparameter comprises: determining a specific quantity of operators in constituent cells of the neural network architecture based on an operator probability distribution defined by the ANASOD encoding; and connecting different operators based on a specified search space to obtain a valid neural network architecture. 6 . The method of claim 1 , wherein the training and evaluating the neural network architecture to obtain a performance indicator comprises: training the neural network architecture on a training dataset; and evaluating the neural network architecture on a validation dataset to obtain the performance indicator, wherein both training set data and validation set data belong to the dataset. 7 . The method of claim 1 , wherein the performing search by using a distributional neural network architecture search (distributional NAS) algorithm further comprises: determining a search policy for the neural network architecture distribution based on the performance indicator and the hyperparameter of the predicted neural network architecture distribution. 8 . The method of claim 1 , wherein the performing search by using a distributional neural network architecture search (distributional NAS) algorithm further comprises: determining a predicted performance value of a hyperparameter of another unknown distribution, comprising a mean value and a variance, based on a hyperparameter and a performance indicator of each found neural network architecture distribution; and determining a performance prediction policy for the neural network architecture distribution based on the mean value and the variance, wherein the performance prediction policy is used to predict the performance indicator of the neural network architecture distribution. 9 . The method of claim 1 , wherein the determining, based on the performance indicator, neural network architecture distributions that share the hyperparameter; to obtain a candidate pool of base learners comprises: determining a search policy for the neural network architecture distribution based on the performance indicator and the hyperparameter; determining a performance prediction policy for the neural network architecture distribution based on the performance indicator and a neural network architecture cell; and searching, according to the search policy and the performance prediction policy, the neural network architecture distributions that share the hyperparameter, to determine the candidate pool of the base learners. 10 . The method of claim 1 , wherein the determining, based on the performance indicator, neural network architecture distributions that share the hyperparameter; to obtain a candidate pool of base learners comprises: outputting, based on a plurality of neural network architectures in a historical search and corresponding performance indicators, a plurality of neural network architectures that share the hyperparameter; determining, based on the plurality of neural network architectures that share the hyperparameter, a neural network architecture distribution that meets a requirement; and generating a plurality of neural network architecture cells based on the neural network architecture distribution that meets the requirement, to obtain a generation distribution/the candidate pool of the base learners. 11 . The method of claim 1 , wherein the determining a surrogate model comprises: obtaining the surrogate model through training on the dataset based on the neural network architecture cells and the performance indicator. 12 . The method of claim 1 , wherein the predicting test performance of the base learner in the candidate pool by using the surrogate model, and determining that k base learners that meet a task scenario requirement form an ensemble model comprises: predicting test performance of a plurality of base learners in the candidate pool by using the surrogate model; performing local search based on a prediction result, and determining q estimated vertex architectures, wherein an estimated vertex architecture is a neural network architecture whose performance indicator predicted by the surrogate model on a validation set is higher than that of an adjacent architecture; and combining k architectures whose performance indicators meet the requirement in the q estimated vertex architectures to obtain the ensemble model. 13 . The method of claim 12 , wherein the combining k architectures whose performance indicators meet the requirement in the q estimated vertex architectures comprises: sorting performance indicators of the q estimated vertex architectures in descending order, and combining k architectures whose performance indicators rank top. 14 . The method of claim 12 , wherein the combining k architectures whose performance indicators meet the re

Assignees

Inventors

Classifications

  • G06N3/0985Primary

    Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • modifying the architecture, e.g. adding, deleting or silencing nodes or connections · CPC title

  • G06N3/045Primary

    Combinations of networks · CPC title

  • G06N3/04Primary

    Architecture, e.g. interconnection topology · CPC title

  • Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024311651A1 cover?
Disclosed is a method for searching for a neural network architecture ensemble model. The method includes: obtaining a dataset, where the dataset includes a sample and an annotation in a classification task; performing search by using a distributional neural network architecture search algorithm, including: determining a hyperparameter of a neural network architecture distribution; sampling a v…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/0985. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).