Systems and methods for ensembling soft prompts in few-shot fine-tuning of language models

US12524613B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12524613-B2
Application numberUS-202318160967-A
CountryUS
Kind codeB2
Filing dateJan 27, 2023
Priority dateAug 19, 2022
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a mechanism that ensembles trainable soft prompts to transfer knowledge from source tasks under few-shot learning settings. Specifically, given a source task input from a source task training dataset, a set of soft prompts may be trained using a frozen PLM on the large-scale source task training dataset. The set of soft prompts are then prepended to a target task input, based on which the frozen pre-trained language model generates a set of logits for predicting classification of the target task input, respectively. An attention module is used to generate input-logit attention scores, which are used to compute a weighted linear combination of the logits given the attention scores. The weighted linear combination are the final logits to predict the final classification of the target task input.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a classification framework comprising one or more pre-trained language models and an attention module via soft prompt tuning in few-shot settings, the method comprising: receiving, via a data interface, a target training sample including a first input and a target output corresponding to a target task; generating one or more training input sequences by prepending the first input with one or more soft prompts that are trained according to a source training task, respectively; generating, by the one or more pre-trained language models, one or more logits from the generated one or more training input sequences combining the first input and the one or more soft prompts, respectively; generating, by the attention module, one or more attention scores in response to a second input combining the representation of the first input and the generated one or more logits; computing a final logit based on the generated one or more logits weighted by the one or more attention scores; normalizing, by a softmax decoder, the final logit to generate a predicted output; computing a cross-entropy loss by comparing the predicted output and the target output; updating the attention module based on the cross-entropy loss via backpropagation while keeping the pre-trained language model and the one or more soft prompts frozen; and using the classification framework comprising the one or more pre-trained language models, the updated attention module and the softmax decoder to generate a classification output corresponding to the target task. 2 . The method of claim 1 , wherein the soft prompts are trained using a source training dataset that is on a different domain or task from the target training sample. 3 . The method of claim 2 , wherein the soft prompts are trained by: receiving, via a data interface, a source input and a corresponding source output from the source training dataset; generating a final input by prepending a randomly initialized task-specific soft prompt to the source input; generating, by the pre-trained language model, an output logit from the final input; generating, by a softmax decoder, a predicted source output from the output logit; computing a loss by comparing the predicted source output and the corresponding source output; and updating the randomly initialized task-specific soft prompt based on the loss via backpropagation while keeping the pre-trained language model frozen. 4 . The method of claim 1 , wherein the one or more attention scores are generated by: generating, by a max pooling operation, the representation of the first input; projecting the representation of the first input into a projected representation of the first input in a representational space; projecting the generated one or more logits into projected one or more logits in the representational space; and computing the one or more attention scores based on the projected representation of the first input and the projected one or more logits. 5 . The method of claim 4 , further comprising: applying normalization to the projected representation of the first input; and applying normalization to the projected one or more logits. 6 . The method of claim 4 , further comprising: generating, by a down projection layer, a reduced representation from the representation of the first input; generating, by a non-linear activation function, a non-linear reduced representation from the reduced representation; and generating, by an up projection layer, the projected representation of the first input from the non-linear reduced representation. 7 . The method of claim 1 , wherein the attention module has a smaller set of parameters than the pre-trained language model. 8 . A system for training a classification framework comprising one or more pre-trained language models and an attention module via soft prompt tuning in few-shot settings, the system comprising: a communication interface that receives a plurality of training samples; a memory containing machine readable medium storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code to cause the one or more processors to: receive, via a data interface, a target training sample including a first input and a target output corresponding to a target task; generate one or more training input sequences by prepending the first input with one or more soft prompts that are trained according to a source training task, respectively; generate, by the one or more pre-trained language models, one or more logits from the generated one or more training input sequences combining the first input and the one or more soft prompts, respectively; generate, by the attention module, one or more attention scores in response to a second input combining the representation of the first input and the generated one or more logits corresponding to the one or more training input sequences combining the input and the one or more soft prompts, respectively; compute a final logit based on the generated one or more logits weighted by the one or more attention scores; normalize, by a softmax decoder, the final logit to generate a predicted output; compute a cross-entropy loss by comparing the predicted output and the target output; update the attention module based on the cross-entropy loss via backpropagation while keeping the pre-trained language model and the one or more soft prompts frozen; and use the classification framework comprising the one or more pre-trained language models, the updated attention module and the softmax decoder to generate a classification output corresponding to the target task. 9 . The system of claim 8 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to train the soft prompts using a source training dataset that is on a different domain or task from the target training sample. 10 . The system of claim 9 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to train the soft prompts comprising: receive, via a data interface, a source input and a corresponding source output from the source training dataset; generating a final input by prepending a randomly initialized task-specific soft prompt to the source input; generate, by the pre-trained language model, an output logit from the final input; generate, by a softmax decoder, a predicted source output from the output logit; compute a loss by comparing the generated output and the corresponding source output; and update the randomly initialized task-specific soft prompt based on the loss via backpropagation while keeping the pre-trained language model frozen. 11 . The system of claim 8 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to generate attention scores comprising: generate, by a max pooling operation, the representation of the first input; project the representation of the first input into a projected representation of the first input in a representational space; project the generated one or more logits into projected one or more logits in the representational space; and compute the one or more attention scores based on the projected representation of the first input and the projected one or more logits. 12 . The system of claim 11 , further comprising: apply normalization to the projected representation of the first input; and apply normalization to the projected one or more lo

Assignees

Inventors

Classifications

  • Integrating or interfacing systems involving database management systems · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

  • using statistical methods · CPC title

  • G06F40/284Primary

    Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/35Primary

    Discourse or dialogue representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524613B2 cover?
Embodiments described herein provide a mechanism that ensembles trainable soft prompts to transfer knowledge from source tasks under few-shot learning settings. Specifically, given a source task input from a source task training dataset, a set of soft prompts may be trained using a frozen PLM on the large-scale source task training dataset. The set of soft prompts are then prepended to a target…
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).