Modifying poses of two-dimensional humans in two-dimensional images by reposing three-dimensional human models representing the two-dimensional humans
US-2024144623-A1 · May 2, 2024 · US
US12100099B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12100099-B2 |
| Application number | US-202217984474-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 10, 2022 |
| Priority date | Oct 20, 2022 |
| Publication date | Sep 24, 2024 |
| Grant date | Sep 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, an electronic device, and a computer program product for generating a three-dimensional scene are provided in embodiments of the present disclosure. The method may include obtaining source image features from a plurality of two-dimensional source images associated with the three-dimensional scene to be generated. The method may further include obtaining editing features from an editing instruction input by a user for the three-dimensional scene, each of the editing features respectively forming a feature pair with each of the source image features. Furthermore, the method may include updating the source image features by maximizing a correlation coefficient of each of the feature pairs, and generating the three-dimensional scene based at least on the updated source image features. Embodiments of the present disclosure can realize arbitrary editing of a three-dimensional scene, thus enhancing the experience of human-computer interaction.
Opening claim text (preview).
What is claimed is: 1. A method for generating a three dimensional scene, comprising: obtaining source image features from a plurality of two-dimensional source images associated with a three-dimensional scene to be generated; obtaining editing features from an editing instruction input by a user for the three-dimensional scene, each of the editing features respectively forming a feature pair with each of the source image features; updating the source image features based on a correlation coefficient of each of the feature pairs; and generating the three-dimensional scene based on the updated source image features; wherein updating the source image features further comprises obtaining segmentation features from the editing instruction, and updating the source image features based on distances between the segmentation features and the source image features. 2. The method according to claim 1 , wherein the editing instruction comprises at least an editing image, and the method further comprises: determining a segmentation image for the three-dimensional scene based on the editing image, the segmentation image indicating at least boundaries of one or more objects in the editing image; obtaining the segmentation features from the segmentation image; and updating the source image features by minimizing the distances between the segmentation features and the source image features. 3. The method according to claim 1 , wherein obtaining the source image features comprises: extracting auxiliary feature vectors from the plurality of two-dimensional source images through a plurality of fully connected layers; determining extended feature vectors based on the auxiliary feature vectors; and obtaining normalized source vectors from the extended feature vectors using an embedded layer as the source image features. 4. The method according to claim 3 , wherein the editing instruction comprises an editing image, and obtaining the editing features comprises: extracting image feature vectors from the editing image by an image encoder; and obtaining normalized editing vectors from the image feature vectors using the embedded layer as the editing features. 5. The method according to claim 3 , wherein the editing instruction comprises editing text and an editing image, and obtaining the editing features comprises: respectively extracting text feature vectors and image feature vectors from the editing text and the editing image by a text encoder and an image encoder; and obtaining normalized editing vectors from the text feature vectors and the image feature vectors using the embedded layer as the editing features. 6. The method according to claim 1 , wherein the plurality of two-dimensional source images indicate a shooting angle associated with the three-dimensional scene and three-dimensional coordinates of at least one object in the three-dimensional scene. 7. A method, comprising: obtaining source image features from a plurality of two-dimensional source images associated with a three-dimensional scene to be generated; obtaining editing features from an editing instruction input by a user for the three-dimensional scene, each of the editing features respectively forming a feature pair with each of the source image features; updating the source image features based on a correlation coefficient of each of the feature pairs; and generating the three-dimensional scene based on the updated source image features; wherein the plurality of two-dimensional source images indicate a shooting angle associated with the three-dimensional scene and three-dimensional coordinates of at least one object in the three-dimensional scene; and wherein generating the three-dimensional scene comprises: determining a probability of occurrence of the at least one object at the three-dimensional coordinates based on the updated source image features and the three-dimensional coordinates; determining a color channel value based on the updated source image features, the probability, and the shooting angle; and generating the three-dimensional scene based on the color channel value. 8. The method according to claim 1 , wherein the editing instruction is used for instructing to perform at least one of the following editing operations: object addition; object deletion; and object background editing. 9. An electronic device, comprising: at least one processor; and a memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: obtaining source image features from a plurality of two-dimensional source images associated with a three-dimensional scene to be generated; obtaining editing features from an editing instruction input by a user for the three-dimensional scene, each of the editing features respectively forming a feature pair with each of the source image features; updating the source image features based on a correlation coefficient of each of the feature pairs; and generating the three-dimensional scene based on the updated source image features; wherein updating the source image features further comprises obtaining segmentation features from the editing instruction, and updating the source image features based on distances between the segmentation features and the source image features. 10. The electronic device according to claim 9 , wherein the editing instruction comprises at least an editing image, and the actions further comprise: determining a segmentation image for the three-dimensional scene based on the editing image, the segmentation image indicating at least boundaries of one or more objects in the editing image; obtaining the segmentation features from the segmentation image; and updating the source image features by minimizing the distances between the segmentation features and the source image features. 11. The electronic device according to claim 9 , wherein obtaining the source image features comprises: extracting auxiliary feature vectors from the plurality of two-dimensional source images through a plurality of fully connected layers; determining extended feature vectors based on the auxiliary feature vectors; and obtaining normalized source vectors from the extended feature vectors using an embedded layer as the source image features. 12. The electronic device according to claim 11 , wherein the editing instruction comprises an editing image, and obtaining the editing features comprises: extracting image feature vectors from the editing image by an image encoder; and obtaining normalized editing vectors from the image feature vectors using the embedded layer as the editing features. 13. The electronic device according to claim 11 , wherein the editing instruction comprises editing text and an editing image, and obtaining the editing features comprises: respectively extracting text feature vectors and image feature vectors from the editing text and the editing image by a text encoder and an image encoder; and obtaining normalized editing vectors from the text feature vectors and the image feature vectors using the embedded layer as the editing features. 14. The electronic device according to claim 9 , wherein the plurality of two-dimensional source images indicate a shooting angle associated with the three-dimensional scene and three-dimensional coordinates of at least one object in the three-dimensional scene. 15. The electronic device according to claim 14 , wherein generating the three-dimensional scene comprises: determin
Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title
Creating or editing images; Combining images with text · CPC title
Two-dimensional [2D] image generation · CPC title
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
Shape modification · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.