Face identity preservation for image-to-image models using stable diffusion generative model
US-12423777-B2 · Sep 23, 2025 · US
US12586199B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586199-B2 |
| Application number | US-202318310414-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 1, 2023 |
| Priority date | Nov 1, 2022 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An open-vocabulary diffusion-based panoptic segmentation system is not limited to perform segmentation using only object categories seen during training, and instead can also successfully perform segmentation of object categories not seen during training and only seen during testing and inferencing. In contrast with conventional techniques, a text-conditioned diffusion (generative) model is used to perform the segmentation. The text-conditioned diffusion model is pre-trained to generate images from text captions, including computing internal representations that provide spatially well-differentiated object features. The internal representations computed within the diffusion model comprise object masks and a semantic visual representation of the object. The semantic visual representation may be extracted from the diffusion model and used in conjunction with a text representation of a category label to classify the object. Objects are classified by associating the text representations of category labels with the object masks and their semantic visual representations to produce panoptic segmentation data.
Opening claim text (preview).
What is claimed is: 1 . A method of generating segmentation data, comprising: processing an input image and corresponding metadata representing a description of the input image by a diffusion model that has been trained to synthesize an image based on the description; extracting an internal feature representation of the input image defined by features computed by at least one intermediate layer during at least one processing iteration of the diffusion model; and computing the segmentation data for the input image using the internal feature representation. 2 . The method of claim 1 , wherein the segmentation data comprises object masks for one or more objects depicted in the input image and object category labels corresponding to the description to the object masks that are mapped to the object masks. 3 . The method of claim 1 , further comprising generating panoptic segmentation data for the input image based on the segmentation data and text embeddings corresponding to a caption associated with the description or object category labels corresponding to the description. 4 . The method of claim 3 , further comprising: extracting the object category labels from the caption; and processing the object category labels by a text encoder to produce the text embeddings. 5 . The method of claim 4 , wherein a mask generator applies parameters to the internal feature representation to compute the segmentation data comprising object masks and mask embeddings. 6 . The method of claim 5 , wherein during training of the parameters, the object category labels comprise a training set of object category labels and during inference when the parameters are unchanged at least one new object category label that is not included in the set is encoded in the text embeddings. 7 . The method of claim 1 , wherein the metadata comprises an encoded text caption. 8 . The method of claim 7 , further comprising processing the input image by an implicit captioner to generate the encoded text caption. 9 . The method of claim 8 , wherein an image encoder processes the input image to generate image features and a multilayer perceptron projects the image features to generate the encoded text caption. 10 . The method of claim 9 , wherein the segmentation data comprises object masks and mask embeddings and further comprising: processing the image features by a mask pooling unit to produce additional mask embeddings; and combining the text embeddings corresponding to object category labels, the mask embeddings, and the additional mask embeddings to generate panoptic segmentation data for the input image. 11 . The method of claim 10 , wherein the object category labels include at least one object category label that was not used to train the mask pooling unit and the multilayer perceptron. 12 . The method of claim 1 , wherein at least one of the steps of processing, extracting, or computing is performed on a server or in a data center and the segmentation data is streamed to a user device. 13 . The method of claim 1 , wherein at least one of the steps of processing, extracting, or computing is performed within a cloud computing environment. 14 . The method of claim 1 , wherein at least one of the steps of processing, extracting, or computing is for training, testing, or certifying a neural network employed in a machine, robot, or autonomous vehicle. 15 . The method of claim 1 , wherein at least one of the steps of processing, extracting, or computing is performed on a virtual machine comprising a portion of a graphics processing unit. 16 . A system, comprising: a processor configured to execute a diffusion model to generate segmentation data by: processing an input image and corresponding metadata representing a description of the input image by a diffusion model that has been trained to synthesize an image based on the description; extracting an internal feature representation of the input image defined by features computed by at least one intermediate layer during at least one processing iteration of the diffusion model; and computing the segmentation data for the input image using the internal feature representation. 17 . The system of claim 16 , wherein the segmentation data comprises object masks for one or more objects depicted in the input image and object category labels corresponding to the description to the object masks that are mapped to the object masks. 18 . The system of claim 16 , further comprising generating panoptic segmentation data for the input image based on the segmentation data and text embeddings corresponding to a caption associated with the description or object category labels corresponding to the description. 19 . A non-transitory computer-readable media storing computer instructions that, when executed by one or more processors, cause the one or more processors to generate segmentation data by performing the steps of: processing an input image and corresponding metadata representing a description of the input image by a diffusion model that has been trained to synthesize an image based on the description; extracting an internal feature representation of the input image defined by features computed by at least one intermediate layer during at least one processing iteration of the diffusion model; and computing the segmentation data for the input image using the internal feature representation. 20 . The non-transitory computer-readable media of claim 19 , further comprising generating panoptic segmentation data for the input image based on the segmentation data and text embeddings corresponding to a caption associated with the description or object category labels corresponding to the description.
Extraction of image or video features · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Recognition assisted with metadata · CPC title
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.