Systems and methods for generating dynamic virtual representations of an object or event
US-2024420395-A1 · Dec 19, 2024 · US
US10565758B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10565758-B2 |
| Application number | US-201715622711-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 14, 2017 |
| Priority date | Jun 14, 2017 |
| Publication date | Feb 18, 2020 |
| Grant date | Feb 18, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed for performing manipulation of facial images using an artificial neural network. A facial rendering and generation network and method learns one or more compact, meaningful manifolds of facial appearance, by disentanglement of a facial image into intrinsic facial properties, and enables facial edits by traversing paths of such manifold(s). The facial rendering and generation network is able to handle a much wider range of manipulations including changes to, for example, viewpoint, lighting, expression, and even higher-level attributes like facial hair and age—aspects that cannot be represented using previous models.
Opening claim text (preview).
What is claimed is: 1. A neural network architecture for manipulating a facial image, said architecture comprising: a disentanglement portion trained to disentangle at least one physical property captured in said facial image, said disentanglement portion receiving said facial image and outputting a disentangled representation of said facial image based on said at least one physical property; and a rendering portion trained to perform a facial manipulation of said facial image based upon an image formation equation and said at least one physical property, thereby generating a manipulated facial image; wherein said disentanglement portion includes at least one first layer, each of said at least one first layer encoding a respective map, wherein each map performs a transformation of said facial image to a respective first intermediate result, said respective first intermediate result associated with one of said at least one physical property; and wherein said rendering portion includes at least one second layer arranged according to said image formation equation for manipulating said facial image, wherein said rendering portion operates on said at least one first intermediate result to generate said manipulated facial image. 2. The neural network architecture of claim 1 , wherein said at least one physical property includes at least one of diffuse albedo, a surface normal, a matte mask, a background, a shape, illumination, and shading. 3. The neural network architecture of claim 1 , wherein a respective first intermediate loss function is associated with each of said at least one map. 4. The neural network architecture of claim 3 , wherein during a training phase, each respective first intermediate loss function causes an inference of said respective map. 5. The neural network architecture of claim 1 , wherein each of said maps further comprises a convolutional encoder stack and at last one convolutional decoder stack, each of said at least one convolutional decoder stack generating one of said respective first intermediate results. 6. The neural network architecture of claim 5 , wherein said convolutional encoder stack generates an entangled representation in a latent space. 7. The neural network architecture of claim 6 , further comprising a fully connected layer. 8. The neural network architecture of claim 7 , wherein said fully connected layer generates said a-disentangled representation in said latent space from said entangled representation. 9. A computer program product including one or more non-transitory computer readable mediums encoded with instructions that when executed by one or more processors cause operations of a neural network architecture to be carried out so as to generate a manipulated facial image, said neural network architecture including a disentanglement portion and a rendering portion, said disentanglement portion trained to disentangle at least one physical property captured in an input facial image, and said rendering portion trained to perform a facial manipulation of said input facial image based upon an image formation equation and said at least one physical property, said operations responsive to receiving said input facial image at said disentanglement portion of said neural network architecture, said operations comprising: disentangling said at least one physical property captured in said input facial image and outputting a disentangled representation of said input facial image based on said at least one physical property; and receiving said disentangled representation of said input facial image at said rendering portion of said neural network architecture, thereby generating a manipulated facial image; wherein said disentanglement portion includes at least one first layer, each of said at least one first layer encoding a respective map, wherein each map performs a transformation of said input facial image to a respective first intermediate result, said respective first intermediate result associated with one of said at least one physical property; and wherein said rendering portion includes at least one second layer arranged according to the image formation equation for manipulating said input facial image, wherein said rendering portion operates on said at least one first intermediate result to generate said manipulated facial image. 10. The computer program product of claim 9 , wherein a respective first intermediate loss function is associated with each of said at least one map, and during a training phase, each respective first intermediate loss function causes an inference of said respective map. 11. The computer program product of claim 9 , wherein said at least one physical property includes at least one of diffuse albedo, a surface normal, a matte mask, a background, a shape, a texture, illumination, and shading. 12. A computer program product including one or more non-transitory machine readable mediums encoded with instructions that when executed by one or more processors cause a process to be carried out for generating a manipulated facial image from an input facial image, said process comprising: associating a respective first intermediate loss function with each of a plurality of first intermediate results generated by a first network portion, wherein each of said plurality of first intermediate results corresponds to a respective intrinsic facial property; providing said plurality of first intermediate results to a second network portion, said second network portion arranged according to an image formation equation for rendering a manipulated facial image based upon said image formation equation; performing a training by imposing a plurality of respective first intermediate loss functions upon each of said first intermediate results, to generate a plurality of weights; assigning said generated weights in said first and second network portions; and providing an input facial image to said first network portion, wherein said first network portion performs a disentanglement of a facial image into said intrinsic facial properties and second network portion receives said disentangled facial properties to generate a manipulated facial image. 13. The computer program product of claim 12 , said process further comprising: associating a respective second intermediate loss function with each of a plurality of second intermediate results associated with said second network portion, wherein said training further imposes said second intermediate loss function upon each of said respective second intermediate results. 14. The computer program product of claim 12 , wherein said associated intrinsic properties are at least one of albedo (A e ), normal (N e ), matte mask (M), and background (I bg ). 15. The computer program product according to claim 14 , said process further comprising generating a pseudo ground-truth (N) for said normal representation N e , wherein said pseudo ground truth is utilized in one of said first intermediate loss functions according to the relationship: E recon-N =∥N e −{circumflex over (N)}∥ 2 . 16. The computer program product of claim 15 , wherein N is estimated by fitting a rough facial geometry to every image in a training set using a 3D morphable model. 17. The computer program product of claim 14 , the process further comprising associating an L1 smoothness intermediate loss function for A e according to the relationship: E smooth-A =∥∇A e ∥, wherein ∇ is a spatial image gradient operator. 18. The computer program product of claim 12 , wherein generating a manipulated facial image further comp
Non-supervised learning, e.g. competitive learning · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Creating or editing images; Combining images with text · CPC title
Face · CPC title
Training; Learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.