Model training method, image editing method, apparatus, device, medium, and product

US2026093978A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026093978-A1
Application numberUS-202519314144-A
CountryUS
Kind codeA1
Filing dateAug 29, 2025
Priority dateSep 27, 2024
Publication dateApr 2, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present application discloses a model training method, an image editing method, an apparatus, a device, a medium, and a product. The method includes: first acquiring an original image, an editing description text corresponding to the original image, an edited image corresponding to the original image with respect to the editing description text, and evaluation information of the edited image, enabling the evaluation information to describe the state of the edited image in at least one evaluation item. Then the original image, the editing description text and the evaluation information by using an image editing model are processed to obtain an image editing result corresponding to the original image. According to the difference between the image editing result and the edited image, the image editing model is updated.

First claim

Opening claim text (preview).

I/We claim: 1 . A model training method, comprising: acquiring an original image, an editing description text corresponding to the original image, an edited image corresponding to the original image with respect to the editing description text, and evaluation information of the edited image, wherein the evaluation information is configured to describe a state of the edited image with respect to at least one evaluation item; processing the original image, the editing description text and the evaluation information by using an image editing model to obtain an image editing result corresponding to the original image; and updating the image editing model according to a difference between the image editing result and the edited image. 2 . The method of claim 1 , wherein the at least one evaluation item comprises one or more of: a text following evaluation item, an image preserving evaluation item, and an image quality evaluation item; the text following evaluation item is configured to describe a matching state presented between the edited image and the edit description text with respect to first content, wherein the first content is determined in accordance with at least one edit instruction described by the edit description text; the image preserving evaluation item is configured to describe a matching state presented between the edited image and the original image with respect to second content, wherein the second content is determined based on content in the original image other than the first content; and the image quality evaluation item is configured to describe a quality change of the edited image relative to the original image. 3 . The method of claim 1 , wherein the evaluation information comprises a score of each of the evaluation items, and/or the evaluation information comprises a defect description text of some or all of the evaluation items; for one of the evaluation items, the score of the evaluation item is configured to characterize a level achieved by the edited image with respect to the one evaluation item, and the defect description text of the one evaluation item is configured to describe the defect of the edited image with respect to the evaluation item. 4 . The method of claim 3 , wherein the at least one evaluation item comprises one or more of: a text following evaluation item, an image preserving evaluation item, or an image quality evaluation item; a score of the text following evaluation item is configured to describe a similarity degree between a change in content of the edited image relative to the original image and a change in content described by the edited description text; a score of the image preserving evaluation item is configured to describe a similarity degree between content retained by the edited image relative to the original image and content in the original image other than the edited content specified by the editing description text; a score of the image quality evaluation term is configured to describe a quality change of the edited image relative to the original image; a defect description text of the text following evaluation item is configured to describe at least one following error of the edited image relative to the edited description text; a defect description text of the image preserving evaluation item is configured to describe at least one preserving error of the edited image relative to the original image; and a defect description text of the image quality evaluation item is configured to describe at least one quality defect of the edited image relative to the original image. 5 . The method of claim 1 , wherein the evaluation information is introduced as condition information into a de-noising network in the image editing model. 6 . The method of claim 5 , wherein for one of the cross-attention layers in the de-noising network, input data of a data fusion layer corresponding to the one cross-attention layer is determined based on the evaluation information and output data of the one cross-attention layer, and the input data of the one cross-attention layer comprises an encoding result of the editing description text. 7 . The method of claim 6 , wherein the input data of the data fusion layer is determined based on a feature vector of the evaluation information; the evaluation information comprises at least one type of data, and the feature vector of the evaluation information is determined according to a vectorization result of each type of data and a vectorization result of each type. 8 . The method of claim 6 , wherein the data fusion layer is configured to perform residual error calculation or sum value calculation according to the output data of the cross-attention layer and a mapping result of a feature vector of the evaluation information in a first feature space, and the first feature space is a feature space to which the output data of the cross-attention layer belongs. 9 . The method of claim 1 , wherein input data of a de-noising network in the image editing model comprises first data and second data, the first data is input to the de-noising network as condition information, input data of a first network layer in the de-noising network comprises the second data, the first data is different from the second data; the second data is determined based on the original image and the evaluation information. 10 . The method of claim 9 , wherein a determination process of the second data comprises: performing concatenating processing on the image feature of the original image and the noise addition result of the image feature of the edited image to obtain a concatenating result; performing convolution processing on the concatenating result to obtain a convolution result; and performing at least one cross-attention processing according to the convolution result and a mapping result of a feature vector of the evaluation information in a second feature space to obtain the second data, wherein the second feature space is a feature space to which the convolution result belongs. 11 . The method of claim 1 , wherein the image editing model comprises a first module, a second module, a third module, a linear layer corresponding to the third module, a de-noising network, a linear layer corresponding to the de-noising network, and a decoding module; the first module is configured to obtain a concatenating result between an image feature of the original image and a noise addition result of an image feature of the edited image; the second module is configured to acquire a feature vector of the evaluation information; the linear layer corresponding to the third module and the linear layer corresponding to the de-noising network are configured to process the feature vector of the evaluation information; the third module is configured to perform at least one cross-attention processing according to the concatenating result and the output data of the linear layer corresponding to the third module; the de-noising network is configured to perform de-noising processing according to the output data of the third module, the encoding result of the editing description text, and the output data of the linear layer corresponding to the de-noising network; and the decoding module is configured to perform decoding processing on the output data of the de-noising network to obtain the image editing result. 12 . The method of claim 11 , wherein updating the image editing model comprises: updating a part of modules in the image editing model, wherein the part of modules comprises some or all network layers in the second module, the third module, the linear layer corresponding to the third module, the de-noi

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • Creating or editing images; Combining images with text · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026093978A1 cover?
The present application discloses a model training method, an image editing method, an apparatus, a device, a medium, and a product. The method includes: first acquiring an original image, an editing description text corresponding to the original image, an edited image corresponding to the original image with respect to the editing description text, and evaluation information of the edited imag…
Who is the assignee on this patent?
Beijing Zitiao Network Technology Co Ltd, Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 02 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).