Method for generating pre-trained language model, electronic device and storage medium

US12204851B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12204851-B2
Application numberUS-202217864636-A
CountryUS
Kind codeB2
Filing dateJul 14, 2022
Priority dateAug 13, 2021
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for generating a pre-trained language model, includes: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating a pre-trained language model, comprising: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model, wherein the plurality of task models comprise a first prediction model, a masked language model and a second prediction model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information. 2. The method of claim 1 , wherein obtaining the trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information, comprises: inputting the typography structure information and the text information into each of the first prediction model, the masked language model and the second prediction model; generating a first loss value through processing, by the first prediction model, the typography structure information and the text information; generating a second loss value through processing, by the masked language model, the typography structure information and the text information; generating a third loss value through processing, by the second prediction model, the typography structure information and the text information; generating a target loss value according to the first loss value, the second loss value and the third loss value; and obtaining the trained pre-trained language model by training the pre-trained language model according to the target loss value. 3. The method of claim 2 , wherein generating the first loss value through processing, by the first prediction model, the typography structure information and the text information, comprises: generating disarranged text information through disarranging sentences in the text information by the first prediction model according to a first preset ratio; determining the sentences in the text information as first labels; and generating the first loss value by performing next sentence prediction on sentences in the disarranged text information according to the first labels and the typography structure information. 4. The method of claim 2 , wherein generating the second loss value through processing, by the masked language model, the typography structure information and the text information, comprises: obtaining a sentence in the text information by the masked language model; disrupting characters in the sentence according to a second preset ratio; determining the disrupted characters in the sentence as second labels; and generating the second loss value by predicting the disrupted characters in the sentence according to the second labels and the typography structure information. 5. The method of claim 2 , wherein generating the third loss value through processing, by the second prediction model, the typography structure information and the text information, comprises: determining the typography structure information as third labels by the second prediction model; and generating the third loss value by performing typography structure prediction on each character in the text information according to the third labels. 6. The method of claim 1 , wherein generating the target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information, comprises: generating fine-tuned typography structure information and fine-tuned text information by calibrating the typography structure information and the text information; and generating the target pre-trained language model by fine-tuning the trained pre-trained language model according to the fine-tuned typography structure information and the fine-tuned text information. 7. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the memory is configured to store instructions executable by the at least one processor, when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model, wherein the plurality of task models comprise a first prediction model, a masked language model and a second prediction model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task models according to the typography structure information and the text information; and generating a target pre-trained language model by fine-tuning the trained pre-trained language model according to the typography structure information and the text information. 8. The device of claim 7 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: inputting the typography structure information and the text information into each of the first prediction model, the masked language model and the second prediction model; generating a first loss value through processing, by the first prediction model, the typography structure information and the text information; generating a second loss value through processing, by the masked language model, the typography structure information and the text information; generating a third loss value through processing, by the second prediction model, the typography structure information and the text information; generating a target loss value according to the first loss value, the second loss value and the third loss value; and obtaining the trained pre-trained language model by training the pre-trained language model according to the target loss value. 9. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: generating disarranged text information through disarranging sentences in the text information by the first prediction model according to a first preset ratio; determining the sentences in the text information as first labels; and generating the first loss value by performing next sentence prediction on sentences in the disarranged text information according to the first labels and the typography structure information. 10. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: obtaining a sentence in the text information by the masked language model; disrupting characters in the sentence according to a second preset ratio; determining the disrupted characters in the sentence as second labels; and generating the second loss value by predicting the disrupted characters in the sentence according to the second labels and the typography structure information. 11. The device of claim 8 , wherein when the instructions are executed by the at least one processor, the at least one processor is enabled to perform: determining the typography structure information as third labels by the second prediction model; and generating the third loss value by performing typography structure prediction on each character in

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Transfer learning · CPC title

  • Learning methods · CPC title

  • Semantic analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12204851B2 cover?
A method for generating a pre-trained language model, includes: obtaining sample files; obtaining typography structure information and text information of the sample files by parsing the sample files; obtaining a plurality of task models of a pre-trained language model; obtaining a trained pre-trained language model by jointly training the pre-trained language model and the plurality of task mo…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).