Systems and methods for finetuning with learned hidden representations of parameter changes

US12511494B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12511494-B2
Application numberUS-202218064095-A
CountryUS
Kind codeB2
Filing dateDec 9, 2022
Priority dateJul 12, 2022
Publication dateDec 30, 2025
Grant dateDec 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a parameter-efficient finetuning mechanism, referred to as “factor-tuning,” which first learns a compact representation of parameter changes with existing datasets on multiple domains, and then fine-tunes a small number of parameters (automatically extracted from the learned representation) on a new downstream task. In this way, the representation learned in the first step is shared across domains and transferred to new downstream tasks.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of parameter-efficient training of a pre-trained language model having a plurality of parameters, the method comprising: receiving a first training dataset on a first domain and a second training dataset on a second domain for finetuning the pre-trained language model having at least a feedforward layer and a query-and-key layer; training the pre-trained language model based on the first training dataset, wherein the plurality of parameters contain a first set of parameter changes during the training; continuing training the pre-trained language model based on the second training dataset, wherein the plurality of parameters contain a second set of parameter changes during the continued training; determining, by a first factor module inserted at the query-and-key layer and a second factor module inserted at the feedforward layer of the pre-trained language model, from the training and the continued training based on the first training dataset and the second training dataset, a first domain-dependent factor associated with the first set of parameter changes and a second domain-dependent factor associated with the second set of parameters changes, respectively; receiving a third training dataset on a third domain for finetuning the pre-trained language model; determining, by a first tunable module inserted at the query-and-key layer and a second tunable module inserted at the feedforward layer of the pre-trained language model, a third set of parameters to be changed based at least in part on the first domain-dependent factor, and the second domain-dependent factor, and the third domain; and fine-tuning the pre-trained language model based on the third training third dataset, while only updating the third set of parameters but fixing remaining parameters of the plurality of parameters. 2 . The method of claim 1 , further comprising: determining, from the training based on the first training dataset and the second training dataset, one or more sparse components that are shared across the first domain and the second domain, wherein parameters of the pre-trained language model are updated as a weighted sum of the one or more sparse components weighted by the first domain-dependent factor or the second domain-dependent factor. 3 . The method of claim 2 , wherein the training the pre-trained language model includes: computing a classification loss based on training inputs from the first training dataset and the second training dataset; computing a regularization term based on the one or more sparse components; computing a training loss based on a weighted sum of the classification loss and the regularization term; and updating the weighted sum of the one or more sparse components based on the computed training loss via backpropagation. 4 . The method of claim 3 , wherein the updating the weighted sum of the one or more sparse components is performed by a first factor module inserted at a query and key layer and a second factor module inserted at a feedforward layer in the pretrained language model, wherein the first factor module or the second factor module updates a subset of the parameters that correspond to the weighted sum of the one or more sparse components. 5 . The method of claim 3 , further comprising: determining, from the updating, the first set of parameter changes corresponding to a first training input from the first training dataset; determining, from the updating, the second set of parameter changes corresponding to a second training input from the second training dataset; and determining the first domain-dependent factor, the second domain-dependent factor and the one or more sparse components based on the first set of parameter changes and the second set of parameter changes. 6 . The method of claim 2 , wherein the determining the third set of parameters to be changed further includes: computing a fine-tuning loss using training inputs from the third training dataset; updating the third set of parameters for the pretrained language model based on the fine-tuning loss, wherein the third set of parameters includes a tunable third domain-dependent factor and one or more tunable components, and wherein each tunable component includes a subset of tunable parameters from a corresponding sparse component from the determined one or more sparse components. 7 . The method of claim 6 , wherein the third set of parameters are computed based on the tunable third domain-dependent factor, and the one or more tunable components applied with a set of sparsity masks, wherein during the updating, the one or more sparse components and the set of sparsity masks are fixed. 8 . The method of claim 7 , wherein each of the set of sparsity masks is computed as an indicator function selecting values from a corresponding sparse component that are greater than a pre-defined threshold. 9 . The method of claim 8 , further comprising: receiving an adjustment to the pre-defined threshold; and controlling a total number of parameters in the third set of parameters by applying an adjusted threshold. 10 . The method of claim 6 , wherein the fine-tuning the pre-trained language model is performed by a first tunable module inserted at a query and key layer and a second tunable module inserted at a feedforward layer in the pretrained language model, wherein the first tunable module or the second tunable module updates the third set of parameters by tuning the tunable third domain-dependent factor and the one or more tunable components. 11 . A system for parameter-efficient training of a pre-trained language model having a plurality of parameters, the system comprising: a communication interface receiving a first training dataset on a first domain and a second training dataset on a second domain for finetuning the pre-trained language model having at least a feedforward layer and a query-and-key layer; a memory storing the pre-trained language model and a plurality of processor-executable instructions; and one or more processors executing the plurality of processor-executable instructions to perform operations including: training the pre-trained language model based on the first training dataset, wherein the plurality of parameters contain a first set of parameter changes during the training; continuing training the pre-trained language model based on the second training dataset, wherein the plurality of parameters contain a second set of parameter changes during the continued training; determining, by a first factor module inserted at the query-and-key layer and a second factor module inserted at the feedforward layer of the pre-trained language model, from the training and the continued training based on the first training dataset and the second training dataset, a first domain-dependent factor associated with the first set of parameter changes and a second domain-dependent factor associated with the second set of parameters changes, respectively; receiving a third training dataset on a third domain for finetuning the pre-trained language model; determining, by a first tunable module inserted at the query-and-key layer and a second tunable module inserted at the feedforward layer of the pre-trained language model, a third set of parameters to be changed based at least in part on the first domain-dependent factor, and the second domain-dependent factor, and the third domain; and fine-tuning the pre-trained language model based on the third training third dataset, while only updating the third set of parameters but fixing remaining parameters of the plurality of parameters. 12 . The system of claim 11 ,

Assignees

Inventors

Classifications

  • Backpropagation, e.g. using gradient descent · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Semantic analysis · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12511494B2 cover?
Embodiments described herein provide a parameter-efficient finetuning mechanism, referred to as “factor-tuning,” which first learns a compact representation of parameter changes with existing datasets on multiple domains, and then fine-tunes a small number of parameters (automatically extracted from the learned representation) on a new downstream task. In this way, the representation learned in…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).