Systems and methods for cross-lingual transfer learning

US12596890B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12596890-B2
Application numberUS-202318309330-A
CountryUS
Kind codeB2
Filing dateApr 28, 2023
Priority dateMar 30, 2023
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein provide a method of training a language model by tuning a prompt. The method comprises masking tokens of first and second conversational texts which have the same semantic meaning but in different languages (e.g., a translation). The masked texts are input to a language model with a prepended soft prompt. The language model generates respective predicted outputs. A loss objective is computed including a masked language model loss. The prompt is updated based on the computed loss objective via backpropagation while keeping the language model frozen.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a neural network based language model to conduct a cross-language conversation by tuning a vector input, the method comprising: receiving, via a data interface, a first conversational text in a first natural language, a second conversational text in a second natural language different from the first natural language, and a third conversational text, wherein the first conversational text contains a first set of tokens, the second conversational text has a same semantic meaning as the first conversational text and contains a second set of tokens, and the third conversational text is semantically unaligned with the first conversational text and contains a third set of tokens; masking a subset of the first set of tokens, a subset of the second set of tokens, and a subset of the third set of tokens, thereby resulting in a masked first set of tokens, a masked second set of tokens, and a masked third set of tokens respectively; generating, by the neural network based language model implemented on one or more hardware processors: a first predicted output based on a first input of the masked first set of tokens prepended with the vector input, a second predicted output, based on a second input of the masked second set of tokens prepended with the vector input, and a third predicted output based on a third input of the masked third set of tokens prepended with the vector input; computing a loss objective including at least: a first masked language model (MLM) loss based on a comparison of the first predicted output and the first conversational text using the first input, a second MLM loss based on a comparison of the second predicted output and the second conversational text using the second input, and a comparison of representations of the first input and the third input as a negative pair; updating the vector input based on the computed loss objective via backpropagation while keeping the neural network based language model frozen; and implementing the neural network based language model to generate a response using an input combining the updated vector input and a conversational input in the first natural language or the second natural language. 2 . The method of claim 1 , wherein the loss objective further comprises a contrastive loss computed based on representations of the first input and the second input as a positive pair. 3 . The method of claim 1 , wherein the first conversational text includes an utterance and a response corresponding to the utterance that are randomly selected from a source conversation. 4 . The method of claim 1 , wherein the computed loss objective is a first computed loss objective according to a first training task, and the method further comprises: updating the vector input based on a second computed loss objective computed according to a second training task different from the first training task after updating the vector input based on the first computed loss objective while keep the language model frozen. 5 . The method of claim 1 , further comprising: receiving, via the data interface, a testing text; and generating, by the language model, a testing output based on a testing input of the testing text prepended with the updated vector input. 6 . The method of claim 1 , wherein the receiving the second conversational text comprises translating, via a translation model, the first conversational text from the first natural language to the second natural language. 7 . A system for training a neural network based language model to conduct a cross-language conversation by tuning a vector input, the system comprising: a memory that stores the neural network based language model and a plurality of processor executable instructions; a communication interface that receives a first conversational text in a first natural language, a second conversational text in a second natural language different from the first natural language, and a third conversational text, wherein the first conversational text contains a first set of tokens, the second conversational text has a same semantic meaning as the first conversational text and contains a second set of tokens, and the third conversational text is semantically unaligned with the first conversational text and contains a third set of tokens; and one or more hardware processors that read and execute the plurality of processor executable instructions from the memory to perform operations comprising: masking a subset of the first set of tokens, a subset of the second set of tokens, and a subset of the third set of tokens, thereby resulting in a masked first set of tokens, a masked second set of tokens, and a masked third set of tokens respectively; generating, by the neural network based language model implemented on one or more hardware processors: a first predicted output based on a first input of the masked first set of tokens prepended with the vector input, a second predicted output, based on a second input of the masked second set of tokens prepended with the vector input, and a third predicted output based on a third input of the masked third set of tokens prepended with the vector input; computing a loss objective including at least: a first masked language model (MLM) loss based on a comparison of the first predicted output and the first conversational text using the first input, a second MLM loss based on a comparison of the second predicted output and the second conversational text using the second input, and a comparison of representations of the first input and the third input as a negative pair; updating the vector input based on the computed loss objective via backpropagation while keeping the neural network based language model frozen; and implementing the neural network based language model to generate a response using an input combining the updated vector input and a conversational input in the first natural language or the second natural language. 8 . The system of claim 7 , wherein the loss objective further comprises a contrastive loss computed based on representations of the first input and the second input as a positive pair. 9 . The system of claim 7 , wherein the first conversational text includes an utterance and a response corresponding to the utterance that are randomly selected from a source conversation. 10 . The system of claim 7 , wherein the computed loss objective is a first computed loss objective according to a first training task, and the operations further comprise: updating the vector input based on a second computed loss objective computed according to a second training task different from the first training task after updating the vector input based on the first computed loss objective while keep the language model frozen. 11 . The system of claim 7 , the operations further comprising: receiving, via the communication interface, a testing text; and generating, by the language model, a testing output based on a testing input of the testing text prepended with the updated vector input. 12 . The system of claim 7 , wherein the receiving the second conversational text comprises translating, via a translation model, the first conversational text from the first natural language to the second natural language. 13 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a data interface, a first conversational text in a first natural language, a second conversational te

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Translation evaluation · CPC title

  • Discourse or dialogue representation · CPC title

  • G06F40/47Primary

    Machine-assisted translation, e.g. using translation memory · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12596890B2 cover?
Embodiments described herein provide a method of training a language model by tuning a prompt. The method comprises masking tokens of first and second conversational texts which have the same semantic meaning but in different languages (e.g., a translation). The masked texts are input to a language model with a prepended soft prompt. The language model generates respective predicted outputs. A …
Who is the assignee on this patent?
Salesforce Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).