Methods and systems for use in processing images related to crops
US-2023108422-A1 · Apr 6, 2023 · US
US12586259B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586259-B2 |
| Application number | US-202418426763-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2024 |
| Priority date | Mar 20, 2023 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus, non-transitory computer readable medium, and system for image generation include obtaining a text embedding of a text prompt and an image embedding of an image prompt. Some embodiments map the text embedding into a joint embedding space to obtain a joint text embedding and map the image embedding into the joint embedding space to obtain a joint image embedding. Some embodiments generate a synthetic image based on the joint text embedding and the joint image embedding.
Opening claim text (preview).
What is claimed is: 1 . A method for image generation, comprising: obtaining a text embedding of a text prompt in a text embedding space and an image embedding of an image prompt in an image embedding space; mapping, using a text mapping network, the text embedding from the text embedding space into a joint embedding space to obtain a joint text embedding; mapping, using an image mapping network, the image embedding from the image embedding space into the joint embedding space to obtain a joint image embedding; and generating, using an image generation model, a synthetic image based on the joint text embedding and the joint image embedding. 2 . The method of claim 1 , further comprising: generating, using a generative adversarial network, a high-resolution version of the synthetic image. 3 . The method of claim 1 , wherein obtaining the text embedding and the image embedding comprises: encoding the text prompt with a text encoder to obtain the text embedding; and encoding the image prompt with an image encoder to obtain the image embedding. 4 . The method of claim 1 , further comprising: concatenating the joint text embedding and the joint image embedding to obtain a combined embedding. 5 . The method of claim 4 , wherein: the text embedding comprises n text tokens, where n is greater than one, the image embedding comprises a single image token, and the combined embedding comprises n+1 combined tokens. 6 . The method of claim 5 , wherein: each of the n text tokens has a dimensionality greater than the single image token. 7 . The method of claim 5 , wherein: each of the n+1 combined tokens has a same dimensionality as the n text tokens. 8 . The method of claim 1 , further comprising: learning a default text embedding for a null text prompt. 9 . The method of claim 1 , further comprising: learning a default image embedding for a null image prompt. 10 . A system for image generation, comprising: one or more processors; one or more memory components coupled with the one or more processors; a text mapping network comprising text mapping parameters, the text mapping network trained to map a text embedding from a text embedding space into a joint embedding space to obtain a joint text embedding; an image mapping network comprising image mapping parameters, the image mapping network trained to map an image embedding from an image embedding space into the joint embedding space to obtain a joint image embedding; and an image generation model comprising image generation parameters, the image generation model trained to generate a synthetic image based on the joint text embedding and the joint image embedding. 11 . The system of claim 10 , the system further comprising: a generative adversarial network (GAN) comprising GAN parameters, the GAN trained to generate a high-resolution version of the synthetic image. 12 . The system of claim 10 , wherein: the text mapping network comprises a multi-layer perceptron (MLP) architecture. 13 . The system of claim 10 , wherein: the image mapping network comprises a multi-layer perceptron (MLP) architecture. 14 . The system of claim 10 , the system further comprising: a text encoder comprising text encoding parameters, the text encoder trained to encode a text prompt to obtain the text embedding. 15 . The system of claim 14 , wherein: the text encoder is configured to learn a default text embedding for a null text prompt. 16 . The system of claim 10 , the system further comprising: an image encoder comprising image encoding parameters, the image encoder trained to encode an image prompt to obtain the image embedding. 17 . The system of claim 16 , wherein: the image encoder is configured to learn a default image embedding for a null image prompt. 18 . A non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to: obtain a text embedding of a text prompt in a text embedding space and an image embedding of an image prompt in an image embedding space; map, using a text mapping network, the text embedding from the text embedding space into a joint embedding space to obtain a joint text embedding; map, using an image mapping network, the image embedding from the image embedding space into the joint embedding space to obtain a joint image embedding; and generate, using an image generation model, a synthetic image based on the joint text embedding and the joint image embedding. 19 . The non-transitory computer readable medium of claim 18 , wherein the instructions further cause the processor to: generate a high-resolution version of the synthetic image using a generative adversarial network (GAN). 20 . The non-transitory computer readable medium of claim 18 , wherein the instructions further cause the processor to: concatenate the joint text embedding and the joint image embedding to obtain a combined embedding.
Lexical analysis, e.g. tokenisation or collocates · CPC title
Training; Learning · CPC title
Artificial neural networks [ANN] · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Two-dimensional [2D] image generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.