Unsupervised style and color cues for transformer-based image generation

US12277630B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12277630-B2
Application numberUS-202217662560-A
CountryUS
Kind codeB2
Filing dateMay 9, 2022
Priority dateMay 9, 2022
Publication dateApr 15, 2025
Grant dateApr 15, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for image processing are configured. Embodiments of the present disclosure identify target style attributes and target structure attributes for a composite image; generate a matrix of composite feature tokens based on the target style attributes and the target structure attributes, wherein subsequent feature tokens of the matrix of composite feature tokens are sequentially generated based on previous feature tokens of the matrix of composite feature tokens according to a linear ordering of the matrix of composite feature tokens; and generate the composite image based on the matrix of composite feature tokens, wherein the composite image includes the target style attributes and the target structure attributes.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for image processing, comprising: identifying target style attributes and target structure attributes for a composite image; ordering structure feature tokens of a matrix of structure feature tokens to obtain a sequence of structure feature tokens; combining the sequence of structure feature tokens with target style features to obtain a combined sequence of feature tokens; generating a matrix of composite feature tokens based on the target style attributes, the target structure attributes, and the combined sequence of feature tokens, wherein the matrix of composite feature tokens comprises a two-dimensional arrangement of composite feature tokens having a plurality of rows and a plurality of columns, and wherein subsequent feature tokens of the matrix of composite feature tokens are autoregressively generated based on previous feature tokens of the matrix of composite feature tokens according to a linear ordering of the matrix of composite feature tokens; and generating the composite image based on the matrix of composite feature tokens, wherein the composite image includes the target style attributes and the target structure attributes. 2. The method of claim 1 , further comprising: generating the target style features and dispensable structure features based on a style image that includes at least a portion of the target style attributes, wherein the target style features represent the target style attributes. 3. The method of claim 2 , further comprising: identifying an additional style image; generating additional target style features based on the additional style image; and combining the target style features and the additional target style features to obtain combined target style features, wherein the matrix of composite feature tokens is generated based on the combined target style features. 4. The method of claim 3 , further comprising: identifying a spatial weighting for the style image and the additional style image, wherein the target style features and the additional target style features are combined based on the spatial weighting. 5. The method of claim 1 , further comprising: selecting a color palette of a style image, wherein the color palette comprises color distribution information of the style image, and wherein the target style attributes include the color palette. 6. The method of claim 5 , further comprising: receiving a grayscale image that includes the target structure attributes; and generating a grayscale image embedding that represents the target structure attributes based on the grayscale image, wherein the matrix of composite feature tokens is generated based on the grayscale image embedding and the color palette. 7. The method of claim 1 , further comprising: receiving a text query that specifies at least a portion of the target style attributes; and generating a text embedding based on the text query, wherein the matrix of composite feature tokens is generated based on the text embedding. 8. The method of claim 1 , further comprising: receiving a structure image that includes the target structure attributes; generating a sketch image of the structure image based on an edge detection model, wherein the sketch image includes the target structure attributes; and generating the matrix of structure feature tokens based on the sketch image. 9. The method of claim 1 , further comprising: identifying a row of the matrix of composite feature tokens; identifying a set of previous feature tokens in the row of the matrix of composite feature tokens; and generating a next feature token in the row of the matrix of composite feature tokens based on the set of previous feature tokens in the row of the matrix of composite feature tokens. 10. The method of claim 9 , further comprising: identifying a row of the matrix of structure feature tokens corresponding to the row of the matrix of composite feature tokens, wherein the next feature token is generated based on the row of the matrix of structure feature tokens. 11. The method of claim 9 , wherein: the next feature token in the row of the matrix of composite feature tokens is generated independently of a previous row in the matrix of composite feature tokens. 12. The method of claim 1 , wherein: each composite feature token of the matrix of composite feature tokens represents a vector from a vector quantized generative adversarial network (VQGAN) codebook. 13. The method of claim 1 , wherein: the target style attributes include color information, texture information, lighting information, high frequency information, or any combination thereof. 14. A method for training a machine learning model, comprising: generating style features of an image using a swapping autoencoder (SAE) model; generating a sketch image from the image using an edge detection model; generating a matrix of structure feature tokens based on the sketch image using a sketch encoder; ordering structure feature tokens of the matrix of structure feature tokens to obtain a sequence of structure feature tokens; combining the sequence of structure feature tokens with the style features to obtain a combined sequence of feature tokens; generating, using a transformer model, a matrix of composite feature tokens based on the style features of the image, the matrix of structure feature tokens, and the combined sequence of feature tokens, wherein the matrix of composite feature tokens comprises a two-dimensional arrangement of composite feature tokens having a plurality of rows and a plurality of columns, and wherein subsequent feature tokens of the matrix of composite feature tokens are autoregressively generated based on previous feature tokens of the matrix of composite feature tokens according to a linear ordering of the matrix of composite feature tokens; generating a matrix of supervision tokens for the image using an image encoder; computing a classification loss based on the matrix of composite feature tokens and the matrix of supervision tokens, wherein each supervision token of the matrix of supervision tokens is selected from a pre-determined collection of tokens; and updating parameters of the transformer model based on the classification loss. 15. The method of claim 14 , further comprising: training the image encoder using an image training set based on a vector quantized generative adversarial network (VQGAN) training method; and training the sketch encoder using a sketch training set based on a VQGAN training method. 16. The method of claim 14 , further comprising: training the SAE model by swapping structure attributes and style attributes of a first training image and a second training image. 17. An apparatus for image processing, comprising: at least one processor; at least one memory including instructions executable by the at least one processor; a swapping autoencoder (SAE) model comprising parameters stored in the at least one memory and configured to generate target style features based on a style image, wherein the target style features represent target style attributes for a composite image; a sketch encoder comprising code stored in the at least one memory and configured to generate a matrix of structure feature tokens based on a sketch image, wherein the matrix of structure feature tokens represents target structure attributes of the sketch image; a transformer model comprising parameters stored in the at least one memory and trained to order structure feature tokens of the matrix of structure feature tokens to obtain a sequence

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12277630B2 cover?
Systems and methods for image processing are configured. Embodiments of the present disclosure identify target style attributes and target structure attributes for a composite image; generate a matrix of composite feature tokens based on the target style attributes and the target structure attributes, wherein subsequent feature tokens of the matrix of composite feature tokens are sequentially g…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).