What technology area does this patent fall under?

Primary CPC classification G06T7/11. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Context-aware human generation in an image

US11854203B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11854203-B1
Application number	US-202017127399-A
Country	US
Kind code	B1
Filing date	Dec 18, 2020
Priority date	Dec 18, 2020
Publication date	Dec 26, 2023
Grant date	Dec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a first image depicting a context including one or more persons having one or more respective poses, receiving a second image depicting a target person having an original pose, where the target person is to be inserted into the context depicted in the first image, generating a target segmentation mask specifying a new pose for the target person in the context of the first image based on the first image, generating a third image depicting the target person having the new pose based on the second image and the target segmentation mask, and generating an output image based on the first image and the third image, the output image depicting the one or more persons having the one or more respective poses and the target person having the new pose.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by a computing device: receiving a first image depicting a context comprising one or more persons having one or more respective poses; receiving a second image depicting a target person having an original pose, wherein the target person is to be inserted into the context depicted in the first image; generating, based on the first image, a target segmentation mask specifying a new pose for the target person in the context of the first image; generating, based on the second image and the target segmentation mask, a third image depicting the target person having the new pose; and generating an output image based on the first image and the third image, the output image depicting the one or more persons having the one or more respective poses and the target person having the new pose. 2. The method of claim 1 , wherein generating the target segmentation mask comprises: generating a source segmentation mask specifying the one or more respective poses of the one or more persons using one or more pre-trained machine-learning models; and processing the source segmentation mask with a first machine-learning model. 3. The method of claim 2 , wherein a segmentation mask comprises a semantic pose map channel and a face channel. 4. The method of claim 3 , wherein the semantic pose map channel comprises n labels corresponding to n segment groups, wherein n segment groups comprise background, hair, face, torso, upper limbs, upper-body wear, lower-body wear, lower limbs, shoes, or any other suitable segment group. 5. The method of claim 3 , wherein the face channel is extracted based on convex hulls over detected facial key-points for faces in an image, and wherein the face channel is a binary representation. 6. The method of claim 2 , wherein information regarding a bounding box is also provided to the first machine-learning model, wherein the bounding box indicates an area in the first image to which the target person is to be added, and wherein the bounding box is determined by a user. 7. The method of claim 2 , wherein the first machine-learning model is trained with a set of training data, wherein each training data comprises a training source image and a training ground truth image. 8. The method of claim 7 , wherein the set of training data is prepared by: collecting a plurality of training ground truth images, each training ground truth image comprising two or more persons; and generating, for each training ground truth image, a training source image by removing one of the two or more persons. 9. The method of claim 8 , wherein, during a training process of the first machine-learning model, trainable variables of the first machine-learning model are updated based on a comparison of a first target segmentation mask generated by the first machine-learning model based on a training source image and a second target segmentation mask computed from a corresponding training ground truth image. 10. The method of claim 1 , wherein generating the third image comprises: segmenting the target person having the original pose in the second image into k segment classes such that each segment class is captured in a sub-image; generating a latent representation by processing the k sub-images with an encoder of a second machine-learning model; and generating the third image by processing the latent representation and the target segmentation mask by a decoder of the second machine-learning model. 11. The method of claim 10 , wherein k segment classes comprise hair, face, upper-body wear, lower-body-wear, skin, shoes, or any other suitable segment class. 12. The method of claim 10 , wherein the decoder of the second machine-learning model comprises a plurality of up-sample layers with interleaving segmentation mask input layers, and wherein each of the segmentation mask input layers takes the target segmentation mask as an input. 13. The method of claim 12 , wherein the interleaving segmentation mask input layers are SPADE blocks. 14. The method of claim 10 , wherein the decoder of the second machine-learning model also produces a first blending mask, wherein the first blending mask is a binary representation indicating an area in the output image that is to be filled by the target person in the third image. 15. The method of claim 14 , wherein generating the output image comprises compositing the first image multiplied by an inverse of the first blending mask and the third image multiplied by the first blending mask. 16. The method of claim 1 , further comprising: generating a first encoding vector corresponding to a face of the target person having an expression in the context of the first image by processing a face crop of the target person from the output image with an encoder of a third machine-learning model; generating a second encoding vector representing face features of the target person by processing the second image with a pre-trained machine-learning model; generating a temporary image comprising a refined face of the target person by processing the first encoding vector and the second encoding vector with a decoder of the third machine-learning model; and blending the generated refined face into the output image. 17. The method of claim 16 , wherein the refined face has the face features of the target person in the second image and the expression of the face of the target person in the output image. 18. The method of claim 16 , wherein the decoder of the third machine-learning model also produces a second blending mask, wherein the second blending mask represents a blending weight to be applied to the temporary image at each pixel of the output image, and wherein blending the generated refined face into the output image comprises: multiplying an inverse of the second blending mask to the output image; and projecting the temporary image multiplied by the second blending mask to the output image. 19. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive a first image depicting a context comprising one or more persons having one or more respective poses; receive a second image depicting a target person having an original pose, wherein the target person is to be inserted into the context depicted in the first image; generate, based on the first image, a target segmentation mask specifying a new pose for the target person in the context of the first image; generate, based on the second image and the target segmentation mask, a third image depicting the target person having the new pose; and generate an output image based on the first image and the third image, the output image depicting the one or more persons having the one or more respective poses and the target person having the new pose. 20. A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: receive a first image depicting a context comprising one or more persons having one or more respective poses; receive a second image depicting a target person having an original pose, wherein the target person is to be inserted into the context depicted in the first image; generate, based on the first image, a target segmentation mask specifying a new pose for the target person in the context of the first image; generate, based on the second image and the target segmentation mask, a third image de

Assignees

Meta Platforms Inc

Inventors

Classifications

G06T7/11Primary
Region-based segmentation · CPC title
G06N20/00
Machine learning · CPC title
G06T7/00
Image analysis · CPC title
G06T7/70Primary
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
G06T2207/20081
Training; Learning · CPC title

Patent family

Related publications grouped by family.

View patent family 89384224

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854203B1 cover?: In one embodiment, a method includes receiving a first image depicting a context including one or more persons having one or more respective poses, receiving a second image depicting a target person having an original pose, where the target person is to be inserted into the context depicted in the first image, generating a target segmentation mask specifying a new pose for the target person in …
Who is the assignee on this patent?: Meta Platforms Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Image generation using one or more neural networks

Image resynthesis using forward warping, gap discriminators, and coordinate-based inpainting

Image generation using one or more neural networks

Semantic image synthesis for generating substantially photorealistic images using neural networks

Systems and methods for face reenactment

System and method for generating group photos

Frequently asked questions