Automated fine-tuning of a pre-trained neural network for transfer learning

US12437190B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437190-B2
Application numberUS-201916704804-A
CountryUS
Kind codeB2
Filing dateDec 5, 2019
Priority dateDec 5, 2019
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In an embodiment, a method for fine-tuning a pre-trained neural network for transfer learning, the method comprising obtaining a first target feature vector from a first layer of a pre-trained neural network responsive to a first target data element of a target dataset passing therethrough, obtaining a first source feature vector associated with the first layer of the pre-trained neural network, calculating a first divergence value for the first layer of the pre-trained neural network based at least in part on the first target feature vector and the first source feature vector, and setting a learning rate for the first layer of the pre-trained neural network based at least in part on the first divergence value.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for re-training a pre-trained neural network comprising: fetching a pre-trained neural network from a library of pre-trained neural networks, wherein the pre-trained neural network has been pre-trained on a source dataset; adjusting a learning rate for each layer of a set of layers of the pre-trained neural network, wherein the pre-trained neural network comprises at least a first layer and a second layer, wherein adjusting the learning rate for each layer comprises: setting, by a processor, a learning rate range for the pre-trained neural network, wherein the learning rate range comprises a largest learning rate; obtaining, by the processor, a first target feature vector from the first layer of the pre-trained neural network responsive to a target dataset passing through the first layer; obtaining, by the processor, a first source feature vector associated with the first layer of the pre-trained neural network responsive to the source dataset passing through the first layer; obtaining, by the processor, a second target feature vector from the second layer of the pre-trained neural network responsive to a second target dataset passing through the second layer, wherein the second target dataset corresponds to an output dataset of the target dataset having passed through the first layer obtaining, by the processor, a second source feature vector from the second layer of the pre-trained neural network responsive to a second source dataset passing through the second layer, wherein the second source dataset corresponds to an output dataset of the target dataset having passed through the first layer; obtaining, by the processor, a first divergence between the first target feature vector and the first source feature vector; obtaining, by the processor, a second divergence between the second target feature vector and the second source feature vector; obtaining, by the processor, a highest divergence and a non-highest divergence by comparing the first divergence to the second divergence; identifying, by the processor, one of at least the first layer and the second layer corresponding to the highest divergence as a highest divergence layer; setting, by the processor, a first learning rate for the highest divergence layer, wherein the first learning rate for the highest divergence layer is the largest learning rate of the learning rate range; identifying, by the processor, one of at least the first layer and the second layer corresponding to the non-highest divergence as a non-highest divergence layer; setting, by the processor, a second learning rate for the non-highest divergence layer, wherein the second learning rate for the non-highest divergence layer is proportional to the highest learning rate with a ratio of the non-highest divergence over the highest divergence; re-training, by the processor, the pre-trained neural network for the target dataset, wherein the retraining comprises: adjusting, by the processor, a first set of weights of the highest divergence layer according to the first learning rate upon inputting of the target dataset; and adjusting, by the processor, a second set of weights of the non-highest divergence layer according to the second learning rate upon inputting of the target dataset. 2. The computer implemented method of claim 1 , further comprising calculating, by the processor, a first target average feature vector based at least in part on the first and second target feature vectors. 3. The computer implemented method of claim 1 , further comprising: calculating, by the processor, a first target average feature vector based at least in part on the first and second target feature vectors; and calculating, by the processor, a first source average feature vector based at least in part on the first and second source feature vectors. 4. The computer implemented method of claim 1 , further comprising: obtaining, by the processor, a third target feature vector from the second layer of the pre-trained neural network, responsive to a first second-layer target dataset passing through the second layer, wherein the first second layer target dataset corresponds to an output dataset of the first layer subsequent to the first target dataset having passed through the first layer; and obtaining, by the processor, a fourth target feature vector from the second layer of the pre-trained neural network responsive to a second second-layer target dataset passing through the second layer; wherein the second second-layer target dataset corresponds to an output dataset of the first layer subsequent to the second target dataset having passed through the first layer. 5. The computer implemented method of claim 4 , further comprising: calculating, by the processor, a first target average feature vector based at least in part on the first and second target feature vectors; calculating, by the processor, a first source average feature vector based at least in part on the first and second source feature vectors; and calculating, by the processor, a second target average feature vector based at least in part on the third and fourth target feature vectors. 6. The computer implemented method of claim 4 , further comprising: obtaining, by the processor, a third source feature vector associated with the second layer of the pre-trained neural network; and obtaining, by the processor, a fourth source feature vector associated with the second layer of the pre-trained neural network. 7. The computer implemented method of claim 6 , further comprising: calculating, by the processor, a first target average feature vector based at least in part on the first and second target feature vectors; calculating, by the processor, a first source average feature vector based at least in part on the first and second source feature vectors; calculating, by the processor, a second target average feature vector based at least in part on the third and fourth target feature vectors; and calculating, by the processor, a second source average feature vector based at least in part on the third and fourth source feature vectors. 8. The computer implemented method of claim 7 , further comprising: calculating, by the processor, first and second normalized target average feature vectors; and calculating, by the processor, first and second normalized source average feature vectors. 9. The computer implemented method of claim 8 , wherein the calculating of the first divergence value comprises calculating a divergence between the first normalized target average feature vector and the first normalized source average feature vector. 10. The computer implemented method of claim 9 , wherein the obtaining of the first, second, third, and fourth source feature vector includes obtaining the first and second source feature vectors from the first layer of the pre-trained neural network responsive to respective first and second source data elements of a source dataset passing through the first layer, and includes obtaining the third and fourth source feature vectors from the second layer of the pre-trained neural network responsive to the respective first and second source data elements of the source dataset passing through the second layer. 11. The computer implemented method of claim 9 , wherein the obtaining of the first, second, third, and fourth source feature vectors includes obtaining the first, second, third, and fourth source feature vectors from memory. 12. A computer implemented method for re-training a pre-trained neural network comprising: fetching a pre-trained neural network from a library of pre-trained neural networks, wherein th

Assignees

Inventors

Classifications

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

  • for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title

  • Transfer learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437190B2 cover?
In an embodiment, a method for fine-tuning a pre-trained neural network for transfer learning, the method comprising obtaining a first target feature vector from a first layer of a pre-trained neural network responsive to a first target data element of a target dataset passing therethrough, obtaining a first source feature vector associated with the first layer of the pre-trained neural network…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).