Alternating Positioning of Primary Text
US-2024419887-A1 · Dec 19, 2024 · US
US12511494B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12511494-B2 |
| Application number | US-202218064095-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 9, 2022 |
| Priority date | Jul 12, 2022 |
| Publication date | Dec 30, 2025 |
| Grant date | Dec 30, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide a parameter-efficient finetuning mechanism, referred to as “factor-tuning,” which first learns a compact representation of parameter changes with existing datasets on multiple domains, and then fine-tunes a small number of parameters (automatically extracted from the learned representation) on a new downstream task. In this way, the representation learned in the first step is shared across domains and transferred to new downstream tasks.
Opening claim text (preview).
What is claimed is: 1 . A method of parameter-efficient training of a pre-trained language model having a plurality of parameters, the method comprising: receiving a first training dataset on a first domain and a second training dataset on a second domain for finetuning the pre-trained language model having at least a feedforward layer and a query-and-key layer; training the pre-trained language model based on the first training dataset, wherein the plurality of parameters contain a first set of parameter changes during the training; continuing training the pre-trained language model based on the second training dataset, wherein the plurality of parameters contain a second set of parameter changes during the continued training; determining, by a first factor module inserted at the query-and-key layer and a second factor module inserted at the feedforward layer of the pre-trained language model, from the training and the continued training based on the first training dataset and the second training dataset, a first domain-dependent factor associated with the first set of parameter changes and a second domain-dependent factor associated with the second set of parameters changes, respectively; receiving a third training dataset on a third domain for finetuning the pre-trained language model; determining, by a first tunable module inserted at the query-and-key layer and a second tunable module inserted at the feedforward layer of the pre-trained language model, a third set of parameters to be changed based at least in part on the first domain-dependent factor, and the second domain-dependent factor, and the third domain; and fine-tuning the pre-trained language model based on the third training third dataset, while only updating the third set of parameters but fixing remaining parameters of the plurality of parameters. 2 . The method of claim 1 , further comprising: determining, from the training based on the first training dataset and the second training dataset, one or more sparse components that are shared across the first domain and the second domain, wherein parameters of the pre-trained language model are updated as a weighted sum of the one or more sparse components weighted by the first domain-dependent factor or the second domain-dependent factor. 3 . The method of claim 2 , wherein the training the pre-trained language model includes: computing a classification loss based on training inputs from the first training dataset and the second training dataset; computing a regularization term based on the one or more sparse components; computing a training loss based on a weighted sum of the classification loss and the regularization term; and updating the weighted sum of the one or more sparse components based on the computed training loss via backpropagation. 4 . The method of claim 3 , wherein the updating the weighted sum of the one or more sparse components is performed by a first factor module inserted at a query and key layer and a second factor module inserted at a feedforward layer in the pretrained language model, wherein the first factor module or the second factor module updates a subset of the parameters that correspond to the weighted sum of the one or more sparse components. 5 . The method of claim 3 , further comprising: determining, from the updating, the first set of parameter changes corresponding to a first training input from the first training dataset; determining, from the updating, the second set of parameter changes corresponding to a second training input from the second training dataset; and determining the first domain-dependent factor, the second domain-dependent factor and the one or more sparse components based on the first set of parameter changes and the second set of parameter changes. 6 . The method of claim 2 , wherein the determining the third set of parameters to be changed further includes: computing a fine-tuning loss using training inputs from the third training dataset; updating the third set of parameters for the pretrained language model based on the fine-tuning loss, wherein the third set of parameters includes a tunable third domain-dependent factor and one or more tunable components, and wherein each tunable component includes a subset of tunable parameters from a corresponding sparse component from the determined one or more sparse components. 7 . The method of claim 6 , wherein the third set of parameters are computed based on the tunable third domain-dependent factor, and the one or more tunable components applied with a set of sparsity masks, wherein during the updating, the one or more sparse components and the set of sparsity masks are fixed. 8 . The method of claim 7 , wherein each of the set of sparsity masks is computed as an indicator function selecting values from a corresponding sparse component that are greater than a pre-defined threshold. 9 . The method of claim 8 , further comprising: receiving an adjustment to the pre-defined threshold; and controlling a total number of parameters in the third set of parameters by applying an adjusted threshold. 10 . The method of claim 6 , wherein the fine-tuning the pre-trained language model is performed by a first tunable module inserted at a query and key layer and a second tunable module inserted at a feedforward layer in the pretrained language model, wherein the first tunable module or the second tunable module updates the third set of parameters by tuning the tunable third domain-dependent factor and the one or more tunable components. 11 . A system for parameter-efficient training of a pre-trained language model having a plurality of parameters, the system comprising: a communication interface receiving a first training dataset on a first domain and a second training dataset on a second domain for finetuning the pre-trained language model having at least a feedforward layer and a query-and-key layer; a memory storing the pre-trained language model and a plurality of processor-executable instructions; and one or more processors executing the plurality of processor-executable instructions to perform operations including: training the pre-trained language model based on the first training dataset, wherein the plurality of parameters contain a first set of parameter changes during the training; continuing training the pre-trained language model based on the second training dataset, wherein the plurality of parameters contain a second set of parameter changes during the continued training; determining, by a first factor module inserted at the query-and-key layer and a second factor module inserted at the feedforward layer of the pre-trained language model, from the training and the continued training based on the first training dataset and the second training dataset, a first domain-dependent factor associated with the first set of parameter changes and a second domain-dependent factor associated with the second set of parameters changes, respectively; receiving a third training dataset on a third domain for finetuning the pre-trained language model; determining, by a first tunable module inserted at the query-and-key layer and a second tunable module inserted at the feedforward layer of the pre-trained language model, a third set of parameters to be changed based at least in part on the first domain-dependent factor, and the second domain-dependent factor, and the third domain; and fine-tuning the pre-trained language model based on the third training third dataset, while only updating the third set of parameters but fixing remaining parameters of the plurality of parameters. 12 . The system of claim 11 ,
Backpropagation, e.g. using gradient descent · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Semantic analysis · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.