Training multi-task neural network while minimizing catastrophic forgetting

US12387107B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12387107-B2
Application numberUS-202418431680-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2024
Priority dateMay 16, 2023
Publication dateAug 12, 2025
Grant dateAug 12, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described herein for a method of determining a similarity of each neuron in a layer of neurons of a neural network model to each other neuron in the layer of neurons. The method further includes determining a redundant set of neurons and a non-redundant set of neurons based on the similarity of each neuron in the layer. The method further includes fine tuning the set of non-redundant neurons using a first set of training data. The method further includes training the set of redundant neurons using a second set of training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: determining, for a layer of a neural network model trained to perform a first task, a similarity value between neurons in the layer by comparing each neuron's weight vector in the layer to each other neuron's weight vector in the layer, wherein the neural network model comprises a structure including an arrangement of one or more neurons in one or more layers; grouping the neurons in the layer into a first subset of neurons or a second subset of neurons based on their similarity values, wherein the first subset of neurons comprises a first subnetwork of the neural network model trained to perform the first task; and training the neural network model to perform a second task, wherein training the neural network model to perform the second task includes training a second subnetwork of the neural network model comprising the second subset of neurons, wherein a gradient is backpropagated to each neuron in the second subset of neurons and wherein the neural network model trained to perform the second task comprises the structure including the arrangement of the one or more neurons in the one or more layers. 2. The method of claim 1 , wherein clustering the neurons in the layer into the first subset of neurons or the second subset of neurons based on their similarity values further comprises: clustering neurons into the second subset of neurons responsive to determining that the similarity value of two or more neurons in the layer satisfy a threshold similarity score; and clustering neurons into the first subset of neurons responsive to determining that the similarity value does not satisfy the threshold similarity score. 3. The method of claim 2 , wherein clustering the neurons in the layer into the first subset of neurons further comprises: selecting a neuron from the second subset of neurons for inclusion in the first subset of neurons. 4. The method of claim 1 , further comprising: determining a second similarity value between neurons in the second subset of neurons and neurons in the second subset of neurons by comparing each neuron's weight vector in the second subset of neurons in the layer to each other neuron's weight vector in the second subset of neurons in the layer; and clustering each neuron of the second subset of neurons into a third subset of neurons or a fourth subset of neurons based on the second similarity value. 5. The method of claim 4 , further comprising: training the neural network model to perform a third task using the third subset of neurons, wherein a second gradient is backpropagated to each neuron in the third subset of neurons and wherein the neural network model trained to perform the third task comprises the structure including the arrangement of the one or more neurons in the one or more layers. 6. The method of claim 4 , wherein clustering each of the second subset of neurons into a third subset of neurons or a fourth subset of neurons based on the second similarity value further comprises: clustering neurons into the third subset of neurons responsive to determining that the second similarity value of two or more neurons in the second subset of neurons satisfy a threshold similarity score; and clustering neurons into the first subset of neurons responsive to determining that the second similarity value does not satisfy the threshold similarity score. 7. The method of claim 1 , further comprising: fine tuning the first subset of neurons using a gradient of a neuron of the first subset of neurons determined using a first set of training data, wherein the first set of training data is used to train the neural network model to perform the first task. 8. A non-transitory computer-readable medium storing executable instructions, which when executed by a processing device, cause the processing device to perform operations comprising: determining, for a layer of a neural network model trained to perform a first task, a similarity value between neurons in the layer by comparing each neuron's weight vector in the layer to each other neuron's weight vector in the layer, wherein the neural network model comprises a structure including an arrangement of one or more neurons in one or more layers; grouping the neurons in the layer into a first subset of neurons or a second subset of neurons based on their similarity values, wherein the first subset of neurons comprises a first subnetwork of the neural network model trained to perform the first task; and training the neural network model to perform a second task, wherein training the neural network model to perform the second task includes training a second subnetwork of the neural network model comprising the second subset of neurons, wherein a gradient is backpropagated to each neuron in the second subset of neurons and wherein the neural network model trained to perform the second task comprises the structure including the arrangement of the one or more neurons in the one or more layers. 9. The non-transitory computer-readable medium of claim 8 , wherein clustering the neurons in the layer into the first subset of neurons or the second subset of neurons based on their similarity values further comprises instructions that cause the processing device to perform operations comprising: clustering neurons into the second subset of neurons responsive to determining that the similarity value of two or more neurons in the layer satisfy a threshold similarity score; and clustering neurons into the first subset of neurons responsive to determining that the similarity value does not satisfy the threshold similarity score. 10. The non-transitory computer-readable medium of claim 9 , wherein clustering the neurons in the layer into the first subset of neurons further comprises instructions that cause the processing device to perform operations comprising: selecting a neuron from the second subset of neurons for inclusion in the first subset of neurons. 11. The non-transitory computer-readable medium of claim 8 , storing instructions that further cause the processing device to perform operations comprising: determining a second similarity value between neurons in the second subset of neurons and neurons in the second subset of neurons by comparing each neuron's weight vector in the second subset of neurons in the layer to each other neuron's weight vector in the second subset of neurons in the layer; and clustering each neuron of the second subset of neurons into a third subset of neurons or a fourth subset of neurons based on the second similarity value. 12. The non-transitory computer-readable medium of claim 11 , storing instructions that further cause the processing device to perform operations comprising: training the neural network model to perform a third task using the third subset of neurons, wherein a second gradient is backpropagated to each neuron in the third subset of neurons and wherein the neural network model trained to perform the third task comprises the structure including the arrangement of the one or more neurons in the one or more layers. 13. The non-transitory computer-readable medium of claim 11 , wherein clustering each of the second subset of neurons into a third subset of neurons or a fourth subset of neurons based on the second similarity value further comprises instructions that cause the processing device to perform operations comprising: clustering neurons into the third subset of neurons responsive to determining that the second similarity value of two or more neurons in the second subset of neurons satisfy a threshold similarity score; and clustering neurons into the first subset of neurons responsive

Assignees

Inventors

Classifications

  • Transfer learning · CPC title

  • G06N3/0985Primary

    Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12387107B2 cover?
Techniques are described herein for a method of determining a similarity of each neuron in a layer of neurons of a neural network model to each other neuron in the layer of neurons. The method further includes determining a redundant set of neurons and a non-redundant set of neurons based on the similarity of each neuron in the layer. The method further includes fine tuning the set of non-redun…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/0985. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).