Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US2026093978A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026093978-A1 |
| Application number | US-202519314144-A |
| Country | US |
| Kind code | A1 |
| Filing date | Aug 29, 2025 |
| Priority date | Sep 27, 2024 |
| Publication date | Apr 2, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application discloses a model training method, an image editing method, an apparatus, a device, a medium, and a product. The method includes: first acquiring an original image, an editing description text corresponding to the original image, an edited image corresponding to the original image with respect to the editing description text, and evaluation information of the edited image, enabling the evaluation information to describe the state of the edited image in at least one evaluation item. Then the original image, the editing description text and the evaluation information by using an image editing model are processed to obtain an image editing result corresponding to the original image. According to the difference between the image editing result and the edited image, the image editing model is updated.
Opening claim text (preview).
I/We claim: 1 . A model training method, comprising: acquiring an original image, an editing description text corresponding to the original image, an edited image corresponding to the original image with respect to the editing description text, and evaluation information of the edited image, wherein the evaluation information is configured to describe a state of the edited image with respect to at least one evaluation item; processing the original image, the editing description text and the evaluation information by using an image editing model to obtain an image editing result corresponding to the original image; and updating the image editing model according to a difference between the image editing result and the edited image. 2 . The method of claim 1 , wherein the at least one evaluation item comprises one or more of: a text following evaluation item, an image preserving evaluation item, and an image quality evaluation item; the text following evaluation item is configured to describe a matching state presented between the edited image and the edit description text with respect to first content, wherein the first content is determined in accordance with at least one edit instruction described by the edit description text; the image preserving evaluation item is configured to describe a matching state presented between the edited image and the original image with respect to second content, wherein the second content is determined based on content in the original image other than the first content; and the image quality evaluation item is configured to describe a quality change of the edited image relative to the original image. 3 . The method of claim 1 , wherein the evaluation information comprises a score of each of the evaluation items, and/or the evaluation information comprises a defect description text of some or all of the evaluation items; for one of the evaluation items, the score of the evaluation item is configured to characterize a level achieved by the edited image with respect to the one evaluation item, and the defect description text of the one evaluation item is configured to describe the defect of the edited image with respect to the evaluation item. 4 . The method of claim 3 , wherein the at least one evaluation item comprises one or more of: a text following evaluation item, an image preserving evaluation item, or an image quality evaluation item; a score of the text following evaluation item is configured to describe a similarity degree between a change in content of the edited image relative to the original image and a change in content described by the edited description text; a score of the image preserving evaluation item is configured to describe a similarity degree between content retained by the edited image relative to the original image and content in the original image other than the edited content specified by the editing description text; a score of the image quality evaluation term is configured to describe a quality change of the edited image relative to the original image; a defect description text of the text following evaluation item is configured to describe at least one following error of the edited image relative to the edited description text; a defect description text of the image preserving evaluation item is configured to describe at least one preserving error of the edited image relative to the original image; and a defect description text of the image quality evaluation item is configured to describe at least one quality defect of the edited image relative to the original image. 5 . The method of claim 1 , wherein the evaluation information is introduced as condition information into a de-noising network in the image editing model. 6 . The method of claim 5 , wherein for one of the cross-attention layers in the de-noising network, input data of a data fusion layer corresponding to the one cross-attention layer is determined based on the evaluation information and output data of the one cross-attention layer, and the input data of the one cross-attention layer comprises an encoding result of the editing description text. 7 . The method of claim 6 , wherein the input data of the data fusion layer is determined based on a feature vector of the evaluation information; the evaluation information comprises at least one type of data, and the feature vector of the evaluation information is determined according to a vectorization result of each type of data and a vectorization result of each type. 8 . The method of claim 6 , wherein the data fusion layer is configured to perform residual error calculation or sum value calculation according to the output data of the cross-attention layer and a mapping result of a feature vector of the evaluation information in a first feature space, and the first feature space is a feature space to which the output data of the cross-attention layer belongs. 9 . The method of claim 1 , wherein input data of a de-noising network in the image editing model comprises first data and second data, the first data is input to the de-noising network as condition information, input data of a first network layer in the de-noising network comprises the second data, the first data is different from the second data; the second data is determined based on the original image and the evaluation information. 10 . The method of claim 9 , wherein a determination process of the second data comprises: performing concatenating processing on the image feature of the original image and the noise addition result of the image feature of the edited image to obtain a concatenating result; performing convolution processing on the concatenating result to obtain a convolution result; and performing at least one cross-attention processing according to the convolution result and a mapping result of a feature vector of the evaluation information in a second feature space to obtain the second data, wherein the second feature space is a feature space to which the convolution result belongs. 11 . The method of claim 1 , wherein the image editing model comprises a first module, a second module, a third module, a linear layer corresponding to the third module, a de-noising network, a linear layer corresponding to the de-noising network, and a decoding module; the first module is configured to obtain a concatenating result between an image feature of the original image and a noise addition result of an image feature of the edited image; the second module is configured to acquire a feature vector of the evaluation information; the linear layer corresponding to the third module and the linear layer corresponding to the de-noising network are configured to process the feature vector of the evaluation information; the third module is configured to perform at least one cross-attention processing according to the concatenating result and the output data of the linear layer corresponding to the third module; the de-noising network is configured to perform de-noising processing according to the output data of the third module, the encoding result of the editing description text, and the output data of the linear layer corresponding to the de-noising network; and the decoding module is configured to perform decoding processing on the output data of the de-noising network to obtain the image editing result. 12 . The method of claim 11 , wherein updating the image editing model comprises: updating a part of modules in the image editing model, wherein the part of modules comprises some or all network layers in the second module, the third module, the linear layer corresponding to the third module, the de-noi
Related publications grouped by family.
Answers are generated from the same data shown on this page.