Utilizing deep learning for automatic digital image segmentation and stylization

US9773196B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9773196-B2
Application numberUS-201615005855-A
CountryUS
Kind codeB2
Filing dateJan 25, 2016
Priority dateJan 25, 2016
Publication dateSep 26, 2017
Grant dateSep 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for segregating target individuals represented in a probe digital image from background pixels in the probe digital image. In particular, in one or more embodiments, the disclosed systems and methods train a neural network based on two or more of training position channels, training shape input channels, training color channels, or training object data. Moreover, in one or more embodiments, the disclosed systems and methods utilize the trained neural network to select a target individual in a probe digital image. Specifically, in one or more embodiments, the disclosed systems and methods generate position channels, training shape input channels, and color channels corresponding the probe digital image, and utilize the generated channels in conjunction with the trained neural network to select the target individual.

First claim

Opening claim text (preview).

We claim: 1. In a digital medium environment for editing digital visual media, a method of using deep learning to automatically select individuals portrayed in the digital visual media, the method comprising: training, by at least one processor, a neural network utilizing training input generated from a repository of digital training images; generating, by the at least one processor, with regard to a probe digital image portraying a target individual, a position channel that indicates positions of pixels in the probe digital image relative to the target individual portrayed in the probe digital image by determining a transform between one or more feature points of the target individual and a canonical pose; and identifying, by the at least one processor, a set of pixels representing the target individual in the probe digital image utilizing the trained neural network and the position channel. 2. The method of claim 1 , wherein: the training input comprises a plurality of training position channels, a plurality of training shape input channels, and a plurality of training color channels corresponding to the digital training images, wherein each training color channel in the plurality of training color channels reflect colors of pixels in a corresponding digital training image; the method further comprises generating a color channel that reflects colors of pixels in the digital training image; and the method identifies the set of pixels representing the target individual by utilizing the color channel. 3. The method of claim 2 , further comprising generating the training input by: identifying a training target individual portrayed in each digital training image; generating a training position channel for each digital training image that indicates positions of pixels in the digital training image relative to the identified training target individual portrayed in the training digital image; and generating a training shape input channel for each digital training image that comprises an estimated shape of the identified target individual portrayed in the digital training image. 4. The method of claim 1 , wherein generating the position channel further comprises: generating an x-position channel that indicates horizontal positions of pixels in the probe digital image relative to a face of the target individual portrayed in the probe digital image; and generating a y-position channel that indicates vertical positions of pixels in the probe digital image relative to the face of the target individual portrayed in the probe digital image. 5. The method of claim 1 , wherein generating the position channel further comprises: detecting one or more facial feature points corresponding to a face of the target individual portrayed in the probe digital image; estimating the transform between the detected one or more facial feature points and the canonical pose, wherein the canonical pose comprises template facial features; and applying the transform to the canonical pose to generate the position channel. 6. The method of claim 5 , wherein the position channel expresses the position of pixels in the canonical pose in a coordinate system that is centered on the face and scaled according to a size of the face. 7. The method of claim 1 , further comprising: generating a shape input channel that comprises an estimated shape of the target individual by: generating a mean digital object mask based on target individuals in a plurality of digital images, the mean digital object mask comprising a shape corresponding to the target individuals in the plurality of digital images; and utilizing the mean digital object mask to generate the shape input channel; and identifying the set of pixels representing the target individual in the probe digital image utilizing the trained neural network, the position channel, and the shape input channel. 8. The method of claim 7 , wherein generating the mean digital object mask comprises: identifying a set of pixels representing a target individual in a digital image from the plurality of digital images; identifying one or more facial feature points corresponding to the target individual in the digital image; estimating a first transform between the facial feature points and a canonical pose; and applying the first transform to the set of pixels representing the target individual in the digital image. 9. The method of claim 8 , wherein utilizing the mean digital object mask to generate the shape input channel comprises: detecting one or more facial feature points corresponding to the target individual portrayed in the probe digital image; estimating a second transform based on the detected one or more facial feature points corresponding to the target individual portrayed in the probe digital image and the canonical pose; and applying the second transform to the mean digital object mask to generate the shape input channel. 10. The method of claim 1 , further comprising modifying the probe digital image based on the set of pixels representing the target individual in the probe digital image by applying one or more of: a first image filter to the set of pixels representing the target individual in the probe digital image; or a second image filter to other pixels in the probe digital image. 11. The method of claim 1 , wherein identifying, by the at least one processor, the set of pixels representing the target individual in the probe digital image further comprises: upon receiving a request to select the target individual in the probe digital image, automatically identifying the set of pixels representing the target individual in the probe digital image without additional user input. 12. In a digital medium environment for editing digital visual media, a method of using deep learning to automatically select individuals portrayed in the digital visual media, the method comprising: accessing: a trained neural network generated from a repository of digital training images, wherein each of the digital training images portrays a training target individual, and a mean digital object mask reflecting a shape based on each of the training target individuals portrayed in the digital training images; generating, with regard to a probe digital image by at least one processor and utilizing the mean digital object mask, a shape input channel comprising an estimated shape of a target individual based on the mean digital object mask by estimating a transform between one or more facial feature points corresponding to the target individual portrayed in the probe digital image and a canonical pose; and identifying, by the at least one processor, a set of pixels representing the target individual in the probe digital image utilizing the trained neural network and the generated shape input channel. 13. The method of claim 12 , wherein utilizing the mean digital object mask to generate the shape input channel comprises: detecting the one or more facial feature points corresponding to the target individual portrayed in the probe digital image; estimating the transform based on the detected one or more facial feature points corresponding to the target individual portrayed in the probe digital image and the canonical pose, wherein the canonical pose comprises template facial features; and applying the transform to the mean digital object mask to generate the shape input channel. 14. The method of claim 12 , further comprising generating a position channel, wherein the position channel indicates a position of pixels in the probe digital image relative to a face of the target individual portrayed in the probe dig

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9773196B2 cover?
Systems and methods are disclosed for segregating target individuals represented in a probe digital image from background pixels in the probe digital image. In particular, in one or more embodiments, the disclosed systems and methods train a neural network based on two or more of training position channels, training shape input channels, training color channels, or training object data. Moreove…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06K9/66. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).