Text-guided cameo generation

US12430812B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12430812-B2
Application numberUS-202217950945-A
CountryUS
Kind codeB2
Filing dateSep 22, 2022
Priority dateSep 22, 2022
Publication dateSep 30, 2025
Grant dateSep 30, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of generating an image for use in a conversation taking place in a messaging application is disclosed. Conversation input text is received from a user of a portable device that includes a display. Model input text is generated from the conversation input text, which is processed with a text-to-image model to generate an image based on the model input text. The coordinates of a face in the image are determined, and the face of the user or another person is added to the image at the location. The final image is displayed on the portable device, and user input is received to transmit the image to a remote recipient.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of generating an image including an existing representation of a face of a person, for use in a conversation taking place in a messaging application, the method comprising: receiving conversation input text from a user of a portable device that includes a display; generating model input text from the conversation input text; generating an image based on the model input text using a text-to-image model; determining coordinates of a face in the generated image; applying the existing representation of the face of the person to the generated image based on the coordinates of the face in the generated image, to generate an updated image including the existing representation of the face of the person; displaying the updated image on the display of the portable device; receiving user input to transmit the updated image in a message; and transmitting, in response to receiving the user input, the updated image to a remote recipient. 2. The method of claim 1 , wherein the generating of the model input text comprises generating additional text using a creative caption function. 3. The method of claim 2 , wherein the generating of the model input text further comprises: extracting key phrases from the additional text. 4. The method of claim 1 , further comprising: processing the conversation input text with a safety filter to determine suitability of the conversation input text for image generation. 5. The method of claim 1 , wherein the text-to-image model has been generated from a large scale image dataset and refined by an existing collection of images for use in a conversation taking place in a messaging application. 6. The method of claim 5 , wherein text associated with images in the existing collection of images is expanded using an image-to-text module. 7. The method of claim 5 , further comprising: animating the face of the person in the updated image. 8. A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to perform operations for generating an image including an existing representation of a face of a person, for use in a conversation taking place in a messaging application, the operations comprising: receiving conversation input text from a user of a portable device that includes a display; generating model input text from the conversation input text; generating an image based on the model input text using a text-to-image model; determining coordinates of a face in the generated image; applying the existing representation of the face of the person to the generated image based on the coordinates of the face in the generated image, to generate an updated image including the existing representation of the face of the person; displaying the updated image on the display of the portable device; receiving user input to transmit the updated image in a message; and transmitting, in response to receiving the user input, the updated image to a remote recipient. 9. The non-transitory computer-readable storage medium of claim 8 , wherein the generating of the model input text comprises generating additional text using a creative caption function. 10. The non-transitory computer-readable storage medium of claim 9 , wherein the generating of the model input text further comprises: extracting key phrases from the additional text. 11. The non-transitory computer-readable storage medium of claim 8 , wherein the operations further comprise: processing the conversation input text with a safety filter to determine suitability of the conversation input text for image generation. 12. The non-transitory computer-readable storage medium of claim 8 , wherein the text-to-image model has been generated from a large scale image dataset and refined by an existing collection of images for use in a conversation taking place in a messaging application. 13. The non-transitory computer-readable storage medium of claim 12 , wherein text associated with images in the existing collection of images is expanded using an image-to-text module. 14. The non-transitory computer-readable storage medium of claim 13 , wherein text associated with the images in the existing collection of images is filtered for relevance of the text to the images in the existing collection of images, by using an image-text relevance model. 15. A computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform operations for generating an image including an existing representation of a face of a person, for use in a conversation taking place in a messaging application, the operations comprising: receiving conversation input text from a user of a portable device that includes a display; generating model input text from the conversation input text; generating an image based on the model input text using a text-to-image model; determining coordinates of a face in the generated image; applying the existing representation of the face of the person to the generated image based on the coordinates of the face in the generated image, to generate an updated image including the existing representation of the face of the person; displaying the updated image on the display of the portable device; receiving user input to transmit the updated image in a message; and transmitting, in response to receiving the user input, the updated image to a remote recipient. 16. The computing apparatus of claim 15 , wherein the generating of the model input text comprises generating additional text using a creative caption function. 17. The computing apparatus of claim 16 , wherein the generating of the model input text further comprises: extracting key phrases from the additional text. 18. The computing apparatus of claim 15 , wherein the text-to-image model has been generated from a large scale image dataset and refined by an existing collection of images for use in a conversation taking place in a messaging application. 19. The computing apparatus of claim 18 , wherein text associated with images in the existing collection of images is expanded using an image-to-text module. 20. The computing apparatus of claim 18 , wherein text associated with the images in the existing collection of images is filtered for relevance of the text to the images in the existing collection of images, by using an image-text relevance model.

Assignees

Inventors

Classifications

  • Detection; Localisation; Normalisation · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Discourse or dialogue representation · CPC title

  • Multimedia information · CPC title

  • Real-time or near real-time messaging, e.g. instant messaging [IM] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12430812B2 cover?
A method of generating an image for use in a conversation taking place in a messaging application is disclosed. Conversation input text is received from a user of a portable device that includes a display. Model input text is generated from the conversation input text, which is processed with a text-to-image model to generate an image based on the model input text. The coordinates of a face in …
Who is the assignee on this patent?
Snap Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).