Custom models for source code generation via prefix-tuning
US-2024220215-A1 · Jul 4, 2024 · US
US12524613B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12524613-B2 |
| Application number | US-202318160967-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 27, 2023 |
| Priority date | Aug 19, 2022 |
| Publication date | Jan 13, 2026 |
| Grant date | Jan 13, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide a mechanism that ensembles trainable soft prompts to transfer knowledge from source tasks under few-shot learning settings. Specifically, given a source task input from a source task training dataset, a set of soft prompts may be trained using a frozen PLM on the large-scale source task training dataset. The set of soft prompts are then prepended to a target task input, based on which the frozen pre-trained language model generates a set of logits for predicting classification of the target task input, respectively. An attention module is used to generate input-logit attention scores, which are used to compute a weighted linear combination of the logits given the attention scores. The weighted linear combination are the final logits to predict the final classification of the target task input.
Opening claim text (preview).
What is claimed is: 1 . A method of training a classification framework comprising one or more pre-trained language models and an attention module via soft prompt tuning in few-shot settings, the method comprising: receiving, via a data interface, a target training sample including a first input and a target output corresponding to a target task; generating one or more training input sequences by prepending the first input with one or more soft prompts that are trained according to a source training task, respectively; generating, by the one or more pre-trained language models, one or more logits from the generated one or more training input sequences combining the first input and the one or more soft prompts, respectively; generating, by the attention module, one or more attention scores in response to a second input combining the representation of the first input and the generated one or more logits; computing a final logit based on the generated one or more logits weighted by the one or more attention scores; normalizing, by a softmax decoder, the final logit to generate a predicted output; computing a cross-entropy loss by comparing the predicted output and the target output; updating the attention module based on the cross-entropy loss via backpropagation while keeping the pre-trained language model and the one or more soft prompts frozen; and using the classification framework comprising the one or more pre-trained language models, the updated attention module and the softmax decoder to generate a classification output corresponding to the target task. 2 . The method of claim 1 , wherein the soft prompts are trained using a source training dataset that is on a different domain or task from the target training sample. 3 . The method of claim 2 , wherein the soft prompts are trained by: receiving, via a data interface, a source input and a corresponding source output from the source training dataset; generating a final input by prepending a randomly initialized task-specific soft prompt to the source input; generating, by the pre-trained language model, an output logit from the final input; generating, by a softmax decoder, a predicted source output from the output logit; computing a loss by comparing the predicted source output and the corresponding source output; and updating the randomly initialized task-specific soft prompt based on the loss via backpropagation while keeping the pre-trained language model frozen. 4 . The method of claim 1 , wherein the one or more attention scores are generated by: generating, by a max pooling operation, the representation of the first input; projecting the representation of the first input into a projected representation of the first input in a representational space; projecting the generated one or more logits into projected one or more logits in the representational space; and computing the one or more attention scores based on the projected representation of the first input and the projected one or more logits. 5 . The method of claim 4 , further comprising: applying normalization to the projected representation of the first input; and applying normalization to the projected one or more logits. 6 . The method of claim 4 , further comprising: generating, by a down projection layer, a reduced representation from the representation of the first input; generating, by a non-linear activation function, a non-linear reduced representation from the reduced representation; and generating, by an up projection layer, the projected representation of the first input from the non-linear reduced representation. 7 . The method of claim 1 , wherein the attention module has a smaller set of parameters than the pre-trained language model. 8 . A system for training a classification framework comprising one or more pre-trained language models and an attention module via soft prompt tuning in few-shot settings, the system comprising: a communication interface that receives a plurality of training samples; a memory containing machine readable medium storing machine executable code; and one or more processors coupled to the memory and configurable to execute the machine executable code to cause the one or more processors to: receive, via a data interface, a target training sample including a first input and a target output corresponding to a target task; generate one or more training input sequences by prepending the first input with one or more soft prompts that are trained according to a source training task, respectively; generate, by the one or more pre-trained language models, one or more logits from the generated one or more training input sequences combining the first input and the one or more soft prompts, respectively; generate, by the attention module, one or more attention scores in response to a second input combining the representation of the first input and the generated one or more logits corresponding to the one or more training input sequences combining the input and the one or more soft prompts, respectively; compute a final logit based on the generated one or more logits weighted by the one or more attention scores; normalize, by a softmax decoder, the final logit to generate a predicted output; compute a cross-entropy loss by comparing the predicted output and the target output; update the attention module based on the cross-entropy loss via backpropagation while keeping the pre-trained language model and the one or more soft prompts frozen; and use the classification framework comprising the one or more pre-trained language models, the updated attention module and the softmax decoder to generate a classification output corresponding to the target task. 9 . The system of claim 8 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to train the soft prompts using a source training dataset that is on a different domain or task from the target training sample. 10 . The system of claim 9 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to train the soft prompts comprising: receive, via a data interface, a source input and a corresponding source output from the source training dataset; generating a final input by prepending a randomly initialized task-specific soft prompt to the source input; generate, by the pre-trained language model, an output logit from the final input; generate, by a softmax decoder, a predicted source output from the output logit; compute a loss by comparing the generated output and the corresponding source output; and update the randomly initialized task-specific soft prompt based on the loss via backpropagation while keeping the pre-trained language model frozen. 11 . The system of claim 8 , wherein the one or more processors are configurable to execute the machine executable code to cause the one or more processors to generate attention scores comprising: generate, by a max pooling operation, the representation of the first input; project the representation of the first input into a projected representation of the first input in a representational space; project the generated one or more logits into projected one or more logits in the representational space; and compute the one or more attention scores based on the projected representation of the first input and the projected one or more logits. 12 . The system of claim 11 , further comprising: apply normalization to the projected representation of the first input; and apply normalization to the projected one or more lo
Integrating or interfacing systems involving database management systems · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
using statistical methods · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Discourse or dialogue representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.