Model training method and apparatus, speech-to-speech translation method and apparatus, and medium
US-2025061888-A1 · Feb 20, 2025 · US
US12596890B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12596890-B2 |
| Application number | US-202318309330-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 28, 2023 |
| Priority date | Mar 30, 2023 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide a method of training a language model by tuning a prompt. The method comprises masking tokens of first and second conversational texts which have the same semantic meaning but in different languages (e.g., a translation). The masked texts are input to a language model with a prepended soft prompt. The language model generates respective predicted outputs. A loss objective is computed including a masked language model loss. The prompt is updated based on the computed loss objective via backpropagation while keeping the language model frozen.
Opening claim text (preview).
What is claimed is: 1 . A method of training a neural network based language model to conduct a cross-language conversation by tuning a vector input, the method comprising: receiving, via a data interface, a first conversational text in a first natural language, a second conversational text in a second natural language different from the first natural language, and a third conversational text, wherein the first conversational text contains a first set of tokens, the second conversational text has a same semantic meaning as the first conversational text and contains a second set of tokens, and the third conversational text is semantically unaligned with the first conversational text and contains a third set of tokens; masking a subset of the first set of tokens, a subset of the second set of tokens, and a subset of the third set of tokens, thereby resulting in a masked first set of tokens, a masked second set of tokens, and a masked third set of tokens respectively; generating, by the neural network based language model implemented on one or more hardware processors: a first predicted output based on a first input of the masked first set of tokens prepended with the vector input, a second predicted output, based on a second input of the masked second set of tokens prepended with the vector input, and a third predicted output based on a third input of the masked third set of tokens prepended with the vector input; computing a loss objective including at least: a first masked language model (MLM) loss based on a comparison of the first predicted output and the first conversational text using the first input, a second MLM loss based on a comparison of the second predicted output and the second conversational text using the second input, and a comparison of representations of the first input and the third input as a negative pair; updating the vector input based on the computed loss objective via backpropagation while keeping the neural network based language model frozen; and implementing the neural network based language model to generate a response using an input combining the updated vector input and a conversational input in the first natural language or the second natural language. 2 . The method of claim 1 , wherein the loss objective further comprises a contrastive loss computed based on representations of the first input and the second input as a positive pair. 3 . The method of claim 1 , wherein the first conversational text includes an utterance and a response corresponding to the utterance that are randomly selected from a source conversation. 4 . The method of claim 1 , wherein the computed loss objective is a first computed loss objective according to a first training task, and the method further comprises: updating the vector input based on a second computed loss objective computed according to a second training task different from the first training task after updating the vector input based on the first computed loss objective while keep the language model frozen. 5 . The method of claim 1 , further comprising: receiving, via the data interface, a testing text; and generating, by the language model, a testing output based on a testing input of the testing text prepended with the updated vector input. 6 . The method of claim 1 , wherein the receiving the second conversational text comprises translating, via a translation model, the first conversational text from the first natural language to the second natural language. 7 . A system for training a neural network based language model to conduct a cross-language conversation by tuning a vector input, the system comprising: a memory that stores the neural network based language model and a plurality of processor executable instructions; a communication interface that receives a first conversational text in a first natural language, a second conversational text in a second natural language different from the first natural language, and a third conversational text, wherein the first conversational text contains a first set of tokens, the second conversational text has a same semantic meaning as the first conversational text and contains a second set of tokens, and the third conversational text is semantically unaligned with the first conversational text and contains a third set of tokens; and one or more hardware processors that read and execute the plurality of processor executable instructions from the memory to perform operations comprising: masking a subset of the first set of tokens, a subset of the second set of tokens, and a subset of the third set of tokens, thereby resulting in a masked first set of tokens, a masked second set of tokens, and a masked third set of tokens respectively; generating, by the neural network based language model implemented on one or more hardware processors: a first predicted output based on a first input of the masked first set of tokens prepended with the vector input, a second predicted output, based on a second input of the masked second set of tokens prepended with the vector input, and a third predicted output based on a third input of the masked third set of tokens prepended with the vector input; computing a loss objective including at least: a first masked language model (MLM) loss based on a comparison of the first predicted output and the first conversational text using the first input, a second MLM loss based on a comparison of the second predicted output and the second conversational text using the second input, and a comparison of representations of the first input and the third input as a negative pair; updating the vector input based on the computed loss objective via backpropagation while keeping the neural network based language model frozen; and implementing the neural network based language model to generate a response using an input combining the updated vector input and a conversational input in the first natural language or the second natural language. 8 . The system of claim 7 , wherein the loss objective further comprises a contrastive loss computed based on representations of the first input and the second input as a positive pair. 9 . The system of claim 7 , wherein the first conversational text includes an utterance and a response corresponding to the utterance that are randomly selected from a source conversation. 10 . The system of claim 7 , wherein the computed loss objective is a first computed loss objective according to a first training task, and the operations further comprise: updating the vector input based on a second computed loss objective computed according to a second training task different from the first training task after updating the vector input based on the first computed loss objective while keep the language model frozen. 11 . The system of claim 7 , the operations further comprising: receiving, via the communication interface, a testing text; and generating, by the language model, a testing output based on a testing input of the testing text prepended with the updated vector input. 12 . The system of claim 7 , wherein the receiving the second conversational text comprises translating, via a translation model, the first conversational text from the first natural language to the second natural language. 13 . A non-transitory machine-readable medium comprising a plurality of machine-executable instructions which, when executed by one or more processors, are adapted to cause the one or more processors to perform operations comprising: receiving, via a data interface, a first conversational text in a first natural language, a second conversational te
Related publications grouped by family.
Answers are generated from the same data shown on this page.