Utilizing deep learning for automatic digital image segmentation and stylization

US9978003B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9978003-B2
Application numberUS-201715679989-A
CountryUS
Kind codeB2
Filing dateAug 17, 2017
Priority dateJan 25, 2016
Publication dateMay 22, 2018
Grant dateMay 22, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are disclosed for segregating target individuals represented in a probe digital image from background pixels in the probe digital image. In particular, in one or more embodiments, the disclosed systems and methods train a neural network based on two or more of training position channels, training shape input channels, training color channels, or training object data. Moreover, in one or more embodiments, the disclosed systems and methods utilize the trained neural network to select a target individual in a probe digital image. Specifically, in one or more embodiments, the disclosed systems and methods generate position channels, training shape input channels, and color channels corresponding the probe digital image, and utilize the generated channels in conjunction with the trained neural network to select the target individual.

First claim

Opening claim text (preview).

We claim: 1. In a digital medium environment, a method of training at least one neural network to automatically select individuals portrayed in digital visual media, the method comprising: identifying a training target individual portrayed in a digital training image; generating a training shape input channel for the digital training image that comprises an estimated shape of the training target individual portrayed in the digital training image; generating a training color channel reflecting colors of pixels in the digital training image; and utilizing the training shape input channel and the training color channel to train a neural network to predict pixels portrayed in probe digital images that reflect target individuals. 2. The method of claim 1 , wherein utilizing the training shape input channel and the training color channel to train the neural network comprises: generating, utilizing the neural network and based on the training shape input channel and the training color channel, an output mask corresponding to the training target individual; and comparing the output mask to a training ground truth mask corresponding to the training target individual; and training the neural network based on the comparison between the output mask and the training ground truth mask. 3. The method of claim 2 , wherein: the neural network comprises a fully convolutional neural network; and comparing the output mask to the training ground truth mask comprises applying a loss function that measurers error between the output mask and the ground truth mask. 4. The method of claim 1 , wherein generating the training shape input channel comprises: generating a mean digital object mask based on target individuals in a plurality of digital images, the mean digital object mask comprising a shape corresponding to the target individuals in the plurality of digital images; and utilizing the mean digital object mask to generate the shape input channel. 5. The method of claim 4 , wherein generating the mean digital object mask comprises: identifying a set of pixels representing a target individual in a digital image from the plurality of digital images; identifying one or more facial feature points corresponding to the target individual in the digital image; estimating a first transform between the facial feature points and a canonical pose; and applying the first transform to the set of pixels representing the target individual in the digital image. 6. The method of claim 4 , wherein utilizing the mean digital object mask to generate the training shape input channel comprises: detecting one or more facial feature points corresponding to the training target individual portrayed in the digital training image; estimating a second transform based on the detected one or more facial feature points corresponding to the training target individual portrayed in the digital training image and the canonical pose; and applying the second transform to the mean digital object mask to generate the shape input channel. 7. The method of claim 1 , further comprising: generating a training position channel for the digital training image that indicates positions of pixels in the digital training image relative to the identified training target individual portrayed in the digital training image; and utilizing the training position channel to train the neural network. 8. The method of claim 7 , wherein generating the training position channel further comprises: detecting one or more facial feature points corresponding to a face of the training target individual portrayed in the digital training image; estimating a transform between the detected one or more facial feature points and a canonical pose; and applying the transform to the canonical pose to generate the training position channel. 9. The method of claim 7 , wherein generating the training position channel further comprises: generating an x-position channel that indicates horizontal positions of pixels in the digital training image relative to the training target individual portrayed in the digital training image; and generating a y-position channel that indicates vertical positions of pixels in the digital training image relative to the training target individual portrayed in the digital training image. 10. A system for training at least one neural network to automatically select individuals portrayed in digital visual media, comprising: at least one processor; and at least one non-transitory computer readable storage medium storing instructions thereon, that, when executed by the at least one processor, cause the system to: generate a training position channel for a digital training image portraying a training target individual, the training positing channel indicating positions of pixels in the digital training image relative to the training target individual portrayed in the digital training image; generate a training color channel reflecting colors of pixels in the digital training image; and utilize the training position channel and the training color channel to train a neural network to predict pixels portrayed in probe digital images that reflect target individuals. 11. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to train the neural network by: generating, utilizing the neural network and based on the training position channel and the training color channel, an output mask corresponding to the training target individual; and comparing the output mask to a training ground truth mask corresponding to the training target individual; and training the neural network based on the comparison between the output mask and the training ground truth mask. 12. The system of claim 11 , wherein: the neural network comprises a fully convolutional neural network; and comparing the output mask to the training ground truth mask comprises applying a loss function that measurers error between the output mask and the ground truth mask. 13. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the training position channel by: detecting one or more facial feature points corresponding to a face of the training target individual portrayed in the digital training image; estimating a transform between the detected one or more facial feature points and a canonical pose; and applying the transform to the canonical pose to generate the training position channel. 14. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to generate the training position channel by: generating an x-position channel that indicates horizontal positions of pixels in the digital training image relative to the training target individual portrayed in the digital training image; and generating a y-position channel that indicates vertical positions of pixels in the digital training image relative to the training target individual portrayed in the digital training image. 15. The system of claim 10 , further comprising instructions that, when executed by the at least one processor, cause the system to generate a training shape input channel for each digital training image that comprises an estimated shape of the identified training target individual portrayed in the digital training image. 16. The system of claim 15 , wherein generating the training shape input channel comprises: detecting one or more facial feature points corresponding to the

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9978003B2 cover?
Systems and methods are disclosed for segregating target individuals represented in a probe digital image from background pixels in the probe digital image. In particular, in one or more embodiments, the disclosed systems and methods train a neural network based on two or more of training position channels, training shape input channels, training color channels, or training object data. Moreove…
Who is the assignee on this patent?
Adobe Systems Inc
What technology area does this patent fall under?
Primary CPC classification G06K9/66. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).