Neural face editing with intrinsic image disentangling

US10565758B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10565758-B2
Application numberUS-201715622711-A
CountryUS
Kind codeB2
Filing dateJun 14, 2017
Priority dateJun 14, 2017
Publication dateFeb 18, 2020
Grant dateFeb 18, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for performing manipulation of facial images using an artificial neural network. A facial rendering and generation network and method learns one or more compact, meaningful manifolds of facial appearance, by disentanglement of a facial image into intrinsic facial properties, and enables facial edits by traversing paths of such manifold(s). The facial rendering and generation network is able to handle a much wider range of manipulations including changes to, for example, viewpoint, lighting, expression, and even higher-level attributes like facial hair and age—aspects that cannot be represented using previous models.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network architecture for manipulating a facial image, said architecture comprising: a disentanglement portion trained to disentangle at least one physical property captured in said facial image, said disentanglement portion receiving said facial image and outputting a disentangled representation of said facial image based on said at least one physical property; and a rendering portion trained to perform a facial manipulation of said facial image based upon an image formation equation and said at least one physical property, thereby generating a manipulated facial image; wherein said disentanglement portion includes at least one first layer, each of said at least one first layer encoding a respective map, wherein each map performs a transformation of said facial image to a respective first intermediate result, said respective first intermediate result associated with one of said at least one physical property; and wherein said rendering portion includes at least one second layer arranged according to said image formation equation for manipulating said facial image, wherein said rendering portion operates on said at least one first intermediate result to generate said manipulated facial image. 2. The neural network architecture of claim 1 , wherein said at least one physical property includes at least one of diffuse albedo, a surface normal, a matte mask, a background, a shape, illumination, and shading. 3. The neural network architecture of claim 1 , wherein a respective first intermediate loss function is associated with each of said at least one map. 4. The neural network architecture of claim 3 , wherein during a training phase, each respective first intermediate loss function causes an inference of said respective map. 5. The neural network architecture of claim 1 , wherein each of said maps further comprises a convolutional encoder stack and at last one convolutional decoder stack, each of said at least one convolutional decoder stack generating one of said respective first intermediate results. 6. The neural network architecture of claim 5 , wherein said convolutional encoder stack generates an entangled representation in a latent space. 7. The neural network architecture of claim 6 , further comprising a fully connected layer. 8. The neural network architecture of claim 7 , wherein said fully connected layer generates said a-disentangled representation in said latent space from said entangled representation. 9. A computer program product including one or more non-transitory computer readable mediums encoded with instructions that when executed by one or more processors cause operations of a neural network architecture to be carried out so as to generate a manipulated facial image, said neural network architecture including a disentanglement portion and a rendering portion, said disentanglement portion trained to disentangle at least one physical property captured in an input facial image, and said rendering portion trained to perform a facial manipulation of said input facial image based upon an image formation equation and said at least one physical property, said operations responsive to receiving said input facial image at said disentanglement portion of said neural network architecture, said operations comprising: disentangling said at least one physical property captured in said input facial image and outputting a disentangled representation of said input facial image based on said at least one physical property; and receiving said disentangled representation of said input facial image at said rendering portion of said neural network architecture, thereby generating a manipulated facial image; wherein said disentanglement portion includes at least one first layer, each of said at least one first layer encoding a respective map, wherein each map performs a transformation of said input facial image to a respective first intermediate result, said respective first intermediate result associated with one of said at least one physical property; and wherein said rendering portion includes at least one second layer arranged according to the image formation equation for manipulating said input facial image, wherein said rendering portion operates on said at least one first intermediate result to generate said manipulated facial image. 10. The computer program product of claim 9 , wherein a respective first intermediate loss function is associated with each of said at least one map, and during a training phase, each respective first intermediate loss function causes an inference of said respective map. 11. The computer program product of claim 9 , wherein said at least one physical property includes at least one of diffuse albedo, a surface normal, a matte mask, a background, a shape, a texture, illumination, and shading. 12. A computer program product including one or more non-transitory machine readable mediums encoded with instructions that when executed by one or more processors cause a process to be carried out for generating a manipulated facial image from an input facial image, said process comprising: associating a respective first intermediate loss function with each of a plurality of first intermediate results generated by a first network portion, wherein each of said plurality of first intermediate results corresponds to a respective intrinsic facial property; providing said plurality of first intermediate results to a second network portion, said second network portion arranged according to an image formation equation for rendering a manipulated facial image based upon said image formation equation; performing a training by imposing a plurality of respective first intermediate loss functions upon each of said first intermediate results, to generate a plurality of weights; assigning said generated weights in said first and second network portions; and providing an input facial image to said first network portion, wherein said first network portion performs a disentanglement of a facial image into said intrinsic facial properties and second network portion receives said disentangled facial properties to generate a manipulated facial image. 13. The computer program product of claim 12 , said process further comprising: associating a respective second intermediate loss function with each of a plurality of second intermediate results associated with said second network portion, wherein said training further imposes said second intermediate loss function upon each of said respective second intermediate results. 14. The computer program product of claim 12 , wherein said associated intrinsic properties are at least one of albedo (A e ), normal (N e ), matte mask (M), and background (I bg ). 15. The computer program product according to claim 14 , said process further comprising generating a pseudo ground-truth (N) for said normal representation N e , wherein said pseudo ground truth is utilized in one of said first intermediate loss functions according to the relationship: E recon-N =∥N e −{circumflex over (N)}∥ 2 . 16. The computer program product of claim 15 , wherein N is estimated by fitting a rough facial geometry to every image in a training set using a 3D morphable model. 17. The computer program product of claim 14 , the process further comprising associating an L1 smoothness intermediate loss function for A e according to the relationship: E smooth-A =∥∇A e ∥, wherein ∇ is a spatial image gradient operator. 18. The computer program product of claim 12 , wherein generating a manipulated facial image further comp

Assignees

Inventors

Classifications

  • Non-supervised learning, e.g. competitive learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • G06T11/60Primary

    Creating or editing images; Combining images with text · CPC title

  • Face · CPC title

  • Training; Learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10565758B2 cover?
Techniques are disclosed for performing manipulation of facial images using an artificial neural network. A facial rendering and generation network and method learns one or more compact, meaningful manifolds of facial appearance, by disentanglement of a facial image into intrinsic facial properties, and enables facial edits by traversing paths of such manifold(s). The facial rendering and gener…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06T11/60. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 18 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).