Method for generating corpus data based on large models

US2026017542A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026017542-A1
Application numberUS-202519327702-A
CountryUS
Kind codeA1
Filing dateSep 12, 2025
Priority dateJul 29, 2025
Publication dateJan 15, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for generating corpus data based on at least one large model is provided, which relate to the field of artificial intelligence technologies, and in particular to the fields of deep learning, large models, and intelligent question answering. The method includes: performing a content generation task by using the at least one large model based on a predetermined requirement condition to obtain a corpus content, where the content generation task includes a plurality of target tasks having dependency relationships, and the plurality of target tasks represent a reasoning process of the at least one large model for a corpus content to be generated; and determining target corpus data based on the corpus content and a reasoning process information related to the plurality of target tasks.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for generating corpus data based on at least one large model, comprising: performing a content generation task by using the at least one large model based on a predetermined requirement condition to obtain a corpus content, wherein the content generation task comprises a plurality of target tasks having dependency relationships, and the plurality of target tasks represent a reasoning process of the at least one large model for a corpus content to be generated; and determining target corpus data based on the corpus content and a reasoning process information related to the plurality of target tasks. 2 . The method of claim 1 , further comprising: displaying a reasoning process topology related to the reasoning process information, wherein the reasoning process topology comprises node elements representing the plurality of target tasks and edge elements representing the dependency relationships; and in response to an editing operation on the reasoning process topology, updating at least one of the plurality of target tasks and the dependency relationships to obtain the reasoning process information. 3 . The method of claim 1 , wherein the plurality of target tasks comprise a tool invocation task; the at least one large model is configured to invoke a target tool to perform a specified task by performing the tool invocation task, to obtain an intermediate result for generating the corpus content; and the reasoning process information comprises a task description information describing an execution process of the tool invocation task. 4 . The method of claim 1 , wherein the performing a content generation task by using the at least one large model based on a predetermined requirement condition to obtain a corpus content comprises: performing the content generation task by using a plurality of large models based on the predetermined requirement condition to obtain a plurality of candidate corpus contents; and determining the corpus content from the plurality of candidate corpus contents in response to a target operation on the plurality of candidate corpus contents. 5 . The method of claim 4 , wherein the determining the corpus content from the plurality of candidate corpus contents in response to a target operation on the plurality of candidate corpus contents comprises: determining a target sub-content from the plurality of candidate corpus contents in response to a target operation on a sub-content of the plurality of candidate corpus contents; and performing a semantic fusion on a plurality of target sub-contents to obtain the corpus content. 6 . The method of claim 4 , further comprising: performing a content quality detection on at least one candidate corpus content to obtain a content quality score, wherein the content quality score corresponding to the at least one candidate corpus content is displayed on an interactive interface, and a target object is allowed to perform the target operation on the at least one candidate corpus content according to the displayed content quality score. 7 . The method of claim 6 , wherein the target corpus data comprises the corpus content, the reasoning process information, an operation information related to the target operation, and the content quality score. 8 . The method of claim 1 , wherein the corpus content is determined by a plurality of large models conducting a dialogue based on the predetermined requirement condition, and the method further comprises: displaying a preset instruction element related to a preset prompt instruction; and in response to a trigger operation on the preset instruction element, prompting, according to a dialogue prompt information in the preset prompt instruction, at least one large model to perform a dialogue content generation task according to the preset prompt instruction, wherein the large model is configured to conduct a dialogue with other large models by performing the dialogue content generation task. 9 . The method of claim 2 , wherein the plurality of target tasks comprise a tool invocation task; the at least one large model is configured to invoke a target tool to perform a specified task by performing the tool invocation task, to obtain an intermediate result for generating the corpus content; and the reasoning process information comprises a task description information describing an execution process of the tool invocation task. 10 . The method of claim 5 , further comprising: performing a content quality detection on at least one candidate corpus content to obtain a content quality score, wherein the content quality score corresponding to the at least one candidate corpus content is displayed on an interactive interface, and a target object is allowed to perform the target operation on the at least one candidate corpus content according to the displayed content quality score. 11 . An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are configured to, when executed by the at least one processor, cause the at least one processor to: perform a content generation task by using the at least one large model based on a predetermined requirement condition to obtain a corpus content, wherein the content generation task comprises a plurality of target tasks having dependency relationships, and the plurality of target tasks represent a reasoning process of the at least one large model for a corpus content to be generated; and determine target corpus data based on the corpus content and a reasoning process information related to the plurality of target tasks. 12 . The electronic device of claim 11 , wherein the at least one processor is further configured to: display a reasoning process topology related to the reasoning process information, wherein the reasoning process topology comprises node elements representing the plurality of target tasks and edge elements representing the dependency relationships; and in response to an editing operation on the reasoning process topology, update at least one of the plurality of target tasks and the dependency relationships to obtain the reasoning process information. 13 . The electronic device of claim 11 , wherein the plurality of target tasks comprise a tool invocation task; the at least one large model is configured to invoke a target tool to perform a specified task by performing the tool invocation task, to obtain an intermediate result for generating the corpus content; and the reasoning process information comprises a task description information describing an execution process of the tool invocation task. 14 . The electronic device of claim 11 , wherein the at least one processor is further configured to: perform the content generation task by using a plurality of large models based on the predetermined requirement condition to obtain a plurality of candidate corpus contents; and determine the corpus content from the plurality of candidate corpus contents in response to a target operation on the plurality of candidate corpus contents. 15 . The electronic device of claim 14 , wherein the at least one processor is further configured to: determine a target sub-content from the plurality of candidate corpus contents in response to a target operation on a sub-content of the plurality of candidate corpus contents; and perform a semantic fusion on a plurality of target sub-contents to obtain the corpus content. 16 . Th

Assignees

Inventors

Classifications

  • G06N5/04Primary

    Inference or reasoning models · CPC title

  • Clustering; Classification · CPC title

  • Summarisation for human users · CPC title

  • G06N5/041Primary

    Abduction · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026017542A1 cover?
A method for generating corpus data based on at least one large model is provided, which relate to the field of artificial intelligence technologies, and in particular to the fields of deep learning, large models, and intelligent question answering. The method includes: performing a content generation task by using the at least one large model based on a predetermined requirement condition to o…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N5/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 15 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).