Augmented reality self-portraits
US-2019082118-A1 · Mar 14, 2019 · US
US10839577B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10839577-B2 |
| Application number | US-201816177408-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 31, 2018 |
| Priority date | Sep 8, 2017 |
| Publication date | Nov 17, 2020 |
| Grant date | Nov 17, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, apparatuses and non-transitory, computer-readable storage mediums are disclosed for generating AR self-portraits or “AR selfies.” In an embodiment, a method comprises: capturing, by a first camera of a mobile device, image data, the image data including an image of a subject in a physical, real-world environment; receiving, by a depth sensor of the mobile device, depth data indicating a distance of the subject from the camera in the physical, real-world environment; receiving, by one or more motion sensors of the mobile device, motion data indicating at least an orientation of the first camera in the physical, real-world environment; generating a virtual camera transform based on the motion data, the camera transform for determining an orientation of a virtual camera in a virtual environment; and generating a composite image data, using the image data, a matte and virtual background content selected based on the virtual camera orientation.
Opening claim text (preview).
What is claimed is: 1. A method comprising: capturing, by a first camera of a mobile device, image data, the image data including an image of a subject in a physical, real-world environment; capturing, by a depth sensor of the mobile device, depth data indicating a distance of the subject from the camera in the physical, real-world environment; capturing, by one or more motion sensors of the mobile device, motion data indicating at least an orientation of the first camera in the physical, real-world environment; generating, by one or more processors of the mobile device, a virtual camera transform based on the motion data, the camera transform for determining an orientation of a virtual camera in a virtual environment; generating, by the one or more processors, a matte from the image data and the depth data, wherein generating the matte includes: inputting the image data and the depth data into a neural network; generating, by the neural network, a low-resolution matte using the image data and the depth data; and processing the low-resolution matte to remove artifacts in the low-resolution matte; generating a high-resolution matte from the processed low-resolution matte, where the high-resolution matte has higher resolution than the low-resolution matte; generating, by the one or more processors, a composite image data, using the image data, the high-resolution matte and a virtual background content, the virtual background content selected from the virtual environment using the camera transform; and causing to display, by the one or more processors, the composite image data on a display of the mobile device. 2. The method of claim 1 , wherein processing the low-resolution matte to remove artifacts in the low-resolution matte, further comprises: generating an inner matte and an outer matte from at least one of a bounding box including a face of the subject or a histogram of the depth data; generating a hole-filled matte from the inner matte; generating a shoulder/torso matte from the hole-filled inner matte; dilating the inner matte using a first kernel; dilating the outer matte using a second kernel smaller than the first kernel; generating a garbage matte from an intersection of the dilated inner matte and the dilated outer matte; combining the low-resolution matte with the garbage matte to create a face matte; combining the face matte and the shoulder/torso matte into a denoised matte; and generating the high-resolution matte from the denoised matte. 3. The method of claim 2 , further comprising: applying a temporal filter to the high-resolution matte to generate a final matte; and generating the composite image data, using the image data, the final matte and the virtual background content. 4. The method of claim 3 , wherein applying the temporal filter to the high-resolution matte to generate a final matte further comprises: generating a per-pixel similarity map based on the image data and previous image data; and applying the temporal filter to the high-resolution matte using the similarity map and a previous final matte. 5. The method of claim 4 , wherein the temporal filter is a linear weighted average of two frames with weights calculated per-pixel dependent on pixel similarity represented by the per-pixel similarity map. 6. The method of claim 2 , wherein generating the high-resolution matte from the processed low-resolution matte, further comprises: generating a luma image from the image data; and upsampling, using a guided filter and the luma image, the denoised matte to the high-resolution matte. 7. The method of claim 2 , wherein generating the shoulder/torso matte from the hole-filled inner matte further comprises: dilating the inner matte to generate the hole-filled matte; and eroding the hole-filled matte to generate the shoulder/torso matte. 8. The method of claim 2 , wherein the inner matte includes depth data that is less than a depth threshold, the outer matte includes depth data that is less than the depth threshold or is unknown, and the depth threshold is determined by an average depth of a center region of the subject's face detected in the image data and an offset to include the back of the subject's head. 9. The method of claim 1 , wherein the neural network is a convolutional neural network for image segmentation. 10. A method comprising: presenting a preview on a display of a mobile device, the preview including sequential frames of preview image data captured by a forward-facing camera of a mobile device positioned in close range of a subject, the sequential frames of preview image data including close range image data of the subject and image data of a background behind the subject in a physical, real world environment; receiving a first user input to apply a virtual environment effect; capturing, by a depth sensor of the mobile device, depth data indicating a distance of the subject from the forward-facing camera in the physical, real-world environment; capturing, by one or more sensors of the mobile device, orientation data indicating at least an orientation of the forward-facing camera in the physical, real-world environment; generating, by one or more processors of the mobile device, a camera transform based on the orientation data, the camera transform describing an orientation of a virtual camera in a virtual environment; generating, by the one or more processors, a matte from the sequential frames of image data and the depth data, wherein generating the matte includes: inputting the image data and the depth data into a neural network; generating, by the neural network, a low-resolution matte using the image data and the depth data; and processing the low-resolution matte to remove artifacts in the low-resolution matte; generating a high-resolution matte from the processed low-resolution matte, where the high-resolution matte has higher resolution than the low-resolution matte; generating, by the one or more processors, composite sequential frames of image data, including the sequential frames of image data, the high-resolution matte and a virtual background content, the virtual background content selected from the virtual environment using the camera transform; and causing display, by the one or more processors, of the composite sequential frames of image data. 11. A system comprising: a display; a camera; a depth sensor; one or more motion sensors; one or more processors; memory coupled to the one or more processors and storing instructions that when executed by the one or more processors, cause the one or more processors to perform operations comprising: capturing, by the camera, image data, the image data including an image of a subject in a physical, real-world environment; capturing, by the depth sensor, depth data indicating a distance of the subject from the camera in the physical, real-world environment; capturing, by the one or more motion sensors, motion data indicating at least an orientation of the camera in the physical, real-world environment; generating a virtual camera transform based on the motion data, the camera transform for determining an orientation of a virtual camera in a virtual environment; generating a matte from the image data and the depth data, wherein generating the matte includes: inputting the image data and the depth data into a neural network; generating, by the neural network, a low-resolution matte using the image data and the depth data; and processing the low-resolution matte to remove artifacts in the low-resolution matte; generating a high-resolution matte from the processed low-resolution matte, where the high-resolution matte has higher re
for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters · CPC title
Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters · CPC title
based on super-resolution, i.e. the output image resolution being higher than the sensor resolution · CPC title
Means for inserting a foreground image in a background image, i.e. inlay, outlay · CPC title
Two-dimensional [2D] image generation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.