Training transfer-focused models for deep learning

US11853877B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11853877-B2
Application numberUS-201916373149-A
CountryUS
Kind codeB2
Filing dateApr 2, 2019
Priority dateApr 2, 2019
Publication dateDec 26, 2023
Grant dateDec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Whether to train a new neural network model can be determined based on similarity estimates between a sample data set and a plurality of source data sets associated with a plurality of prior-trained neural network models. A cluster among the plurality of prior-trained neural network models can be determined. A set of training data based on the cluster can be determined. The new neural network model can be trained based on the set of training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a similarity estimate between a sample data set and a source data set, wherein the sample data set is associated with a target machine learning task, wherein the source data set is associated with a prior-trained neural network model and was used as a training data set used in training the prior-trained neural network model, wherein a plurality of similarity estimates is received corresponding to a plurality of source data sets associated with a plurality of prior-trained neural network models, the similarity estimate determined based on outputs of a hidden layer of the prior-trained neural network model generated using the sample data set and outputs of the hidden layer of the prior-trained neural network model generated using the source data set, each of the plurality of similarity estimates determined based on a distance between the sample data set and a corresponding one of the plurality of source data sets; determining, at least based on the similarity estimates being above a predefined distance threshold, to train a new neural network model, the similarity estimates being above the predefined distance indicating a gap in areas covered by the source data sets used in training the plurality of prior-trained neural network models; responsive to determining to train the new neural network model, creating a cluster among the plurality of prior-trained neural network models by running at least the plurality of prior-trained neural network models using the sample data set, clustering the prior-trained neural network models into different clusters using activations of a hidden layer of the prior-trained neural network model generated using the sample data set, and selecting the cluster closest to the sample data set; determining a set of training data based on the cluster, wherein source data sets used in training a plurality of prior-trained neural network models in the cluster are combined for use as at least part of the set of training data; and training the new neural network model based on the set of training data. 2. The method of claim 1 , wherein the new neural network model is trained as a base model for transfer learning. 3. The method of claim 1 , wherein the creating a cluster comprises creating the cluster based on feature vectors of hidden layers produced by passing, in forward propagation, the sample data set through the plurality of prior-trained neural network models. 4. The method of claim 1 , wherein the plurality of prior-trained neural network models are stored as a library of pre-existing models. 5. The method of claim 1 , wherein the new neural network model is trained beginning with a random initial set of parameters. 6. The method of claim 1 , wherein the new neural network model is trained starting with a set of parameters computed in at least one of the plurality of prior-trained neural network models. 7. The method of claim 1 , wherein the new neural network model is trained using at least one hyperparameter used in at least one of the plurality of prior-trained neural network models. 8. A system comprising: a hardware processor; a memory device coupled with the hardware processor; the hardware processor operable to at least: receive a similarity estimate between a sample data set and a source data set, wherein the sample data set is associated with a target machine learning task, wherein the source data set is associated with a prior-trained neural network model and was used as a training data set used in training the prior-trained neural network model, wherein a plurality of similarity estimates is received corresponding to a plurality of source data sets associated with a plurality of prior-trained neural network models, the similarity estimate determined based on outputs of a hidden layer of the prior-trained neural network model generated using the sample data set and outputs of the hidden layer of the prior-trained neural network model generated using the source data set, each of the plurality of similarity estimates determined based on a distance between the sample data set and a corresponding one of the plurality of source data sets; determine, at least based on the similarity estimates being above a predefined distance threshold, to train a new neural network model, the similarity estimates being above the predefined distance indicating a gap in areas covered by the source data sets used in training the plurality of prior-trained neural network models; responsive to determining to train the new neural network model, create a cluster among the plurality of prior-trained neural network models by at least running the plurality of prior-trained neural network models using the sample data set, clustering the prior-trained neural network models into different clusters using activations of a hidden layer of the prior-trained neural network model generated using the sample data set, and selecting the cluster closest to the sample data set; determine a set of training data based on the cluster, wherein source data sets used in training a plurality of prior-trained neural network models in the cluster are combined for use as at least part of the set of training data; and train the new neural network model based on the set of training data. 9. The system of claim 8 , wherein the new neural network model is trained as a base model for transfer learning. 10. The system of claim 8 , wherein the hardware processor creates the cluster based on feature vectors of hidden layers produced by passing, in forward propagation, the sample data through the plurality of prior-trained neural network models. 11. The system of claim 8 , wherein the plurality of prior-trained neural network models are stored as a library of pre-existing models. 12. The system of claim 8 , wherein the new neural network model is trained beginning with a random initial set of parameters. 13. The system of claim 8 , wherein the new neural network model is trained starting with a set of parameters computed in at least one of the plurality of prior-trained neural network models. 14. The system of claim 8 , wherein the new neural network model is trained using at least one hyperparameter used in at least one of the plurality of prior-trained neural network models. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: receive a similarity estimate between a sample data set and a source data set, wherein the sample data set is associated with a target machine learning task, wherein the source data set is associated with a prior-trained neural network model and was used as a training data set used in training the prior-trained neural network model, wherein a plurality of similarity estimates is received corresponding to a plurality of source data sets associated with a plurality of prior-trained neural network models, the similarity estimate determined based on outputs of a hidden layer of the prior-trained neural network model generated using the sample data set and outputs of the hidden layer of the prior-trained neural network model generated using the source data set, each of the plurality of similarity estimates determined based on a distance between the sample data set and a corresponding one of the plurality of source data sets; determine, at least based on the similarity estimates being above a predefined distance threshold, to train a new neural network model, the similarity estimates being above the predefined distance indicatin

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Transfer learning · CPC title

  • Supervised learning · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11853877B2 cover?
Whether to train a new neural network model can be determined based on similarity estimates between a sample data set and a plurality of source data sets associated with a plurality of prior-trained neural network models. A cluster among the plurality of prior-trained neural network models can be determined. A set of training data based on the cluster can be determined. The new neural network m…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).