Identifying prompts used for training of inference models
US-2024273300-A1 · Aug 15, 2024 · US
US12204851B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12204851-B2 |
| Application number | US-202217864636-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 14, 2022 |
| Priority date | Aug 13, 2021 |
| Publication date | Jan 21, 2025 |
| Grant date | Jan 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for generating a pre-trained language model, includes: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information.
Opening claim text (preview).
What is claimed is: 1. A method for generating a pre-trained language model, comprising: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model, wherein the plurality of task models comprise a first prediction model, a masked language model and a second prediction model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information. 2. The method of claim 1 , wherein obtaining the trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information, comprises: inputting the typography structure information and the text information into each of the first prediction model, the masked language model and the second prediction model; generating a first loss value through processing, by the first prediction model, the typography structure information and the text information; generating a second loss value through processing, by the masked language model, the typography structure information and the text information; generating a third loss value through processing, by the second prediction model, the typography structure information and the text information; generating a target loss value according to the first loss value, the second loss value and the third loss value; and obtaining the trained pre-trained language model by training the pre-trained language model according to the target loss value. 3. The method of claim 2 , wherein generating the first loss value through processing, by the first prediction model, the typography structure information and the text information, comprises: generating disarranged text information through disarranging sentences in the text information by the first prediction model according to a first preset ratio; determining the sentences in the text information as first labels; and generating the first loss value by performing next sentence prediction on sentences in the disarranged text information according to the first labels and the typography structure information. 4. The method of claim 2 , wherein generating the second loss value through processing, by the masked language model, the typography structure information and the text information, comprises: obtaining a sentence in the text information by the masked language model; disrupting characters in the sentence according to a second preset ratio; determining the disrupted characters in the sentence as second labels; and generating the second loss value by predicting the disrupted characters in the sentence according to the second labels and the typography structure information. 5. The method of claim 2 , wherein generating the third loss value through processing, by the second prediction model, the typography structure information and the text information, comprises: determining the typography structure information as third labels by the second prediction model; and generating the third loss value by performing typography structure prediction on each character in the text information according to the third labels. 6. The method of claim 1 , wherein generating the target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information, comprises: generating fine-tuned typography structure information and fine-tuned text information by calibrating the typography structure information and the text information; and generating the target pre-trained language model by fine-tuning the trained pre-trained language model according to the fine-tuned typography structure information and the fine-tuned text information. 7. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the memory is configured to store instructions executable by the at least one processor, when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model, wherein the plurality of task models comprise a first prediction model, a masked language model and a second prediction model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information. 8. The device of claim 7 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: inputting the typography structure information and the text information into each of the first prediction model, the masked language model and the second prediction model; generating a first loss value through processing, by the first prediction model, the typography structure information and the text information; generating a second loss value through processing, by the masked language model, the typography structure information and the text information; generating a third loss value through processing, by the second prediction model, the typography structure information and the text information; generating a target loss value according to the first loss value, the second loss value and the third loss value; and obtaining the trained pre-trained language model by training the pre-trained language model according to the target loss value. 9. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: generating disarranged text information through disarranging sentences in the text information by the first prediction model according to a first preset ratio; determining the sentences in the text information as first labels; and generating the first loss value by performing next sentence prediction on sentences in the disarranged text information according to the first labels and the typography structure information. 10. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: obtaining a sentence in the text information by the masked language model; disrupting characters in the sentence according to a second preset ratio; determining the disrupted characters in the sentence as second labels; and generating the second loss value by predicting the disrupted characters in the sentence according to the second labels and the typography structure information. 11. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: determining the typography structure information as third labels by the second prediction model; and generating the third loss value by performing typography structure prediction on each character in
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Transfer learning · CPC title
Learning methods · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.