Method and device for generating an image representative of a cluster of images
US-2018144212-A1 · May 24, 2018 · US
US12099327B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12099327-B2 |
| Application number | US-202117360693-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2021 |
| Priority date | Jun 28, 2021 |
| Publication date | Sep 24, 2024 |
| Grant date | Sep 24, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A holographic calling system can capture and encode holographic data at a sender-side of a holographic calling pipeline and decode and present the holographic data as a 3D representation of a sender at a receiver-side of the holographic calling pipeline. The holographic calling pipeline can include stages to capture audio, color images, and depth images; densify the depth images to have a depth value for each pixel while generating parts masks and a body model; use the masks to segment the images into parts needed for hologram generation; convert depth images into a 3D mesh; paint the 3D mesh with color data; perform torso disocclusion; perform face reconstruction; and perform audio synchronization. In various implementations, different of these stages can be performed sender-side or receiver side. The holographic calling pipeline also includes sender-side compression, transmission over a communication channel, and receiver-side decompression and hologram output.
Opening claim text (preview).
We claim: 1. A method for conducting a holographic call using a holographic call pipeline, the method comprising: establishing a communication channel between a sending device and at least one receiving device; capturing, at the sending device, color, depth, and audio data and using the color and depth data to generate one or more color images and one or more depths images; generating one or more masks for the one or more color images and one or more depth images; applying the one or more masks to the one or more color images and one or more depth images to obtain masked portions of the one or more color images and one or more depth images; compressing the masked portions of the one or more color images and one or more depth images; and synchronizing and transmitting, over the communication channel, the one or more color images, one or more depth images, and the audio data; wherein the receiving device, in response to the transmitting: decompresses the compressed portions; converts the portions of the one or more depth images into a 3D mesh; paints the portions of the one or more color images onto the 3D mesh; synchronizes the audio data with the painted 3D mesh; performs torso disocclusion on the 3D mesh; performs facial reconstruction on the 3D mesh; and outputs the painted 3D mesh as a hologram with synchronized audio. 2. The method of claim 1 , wherein the communication channel is a real-time communication channel that provides latency guarantees. 3. The method of claim 1 , wherein capturing the depth data includes capturing structured light by emitting a pattern of infrared (IR) light, capturing reflections of the IR light, and determining depth data based on how the pattern of IR light is distorted and/or using time-of-flight readings for parts of the pattern of IR light. 4. The method of claim 1 , wherein the one or more color images include multiple color images captured from multiple cameras at different resolutions, and wherein the generating one or more masks for the one or more color images and one or more depth images is performed based on the color images captured at the lower resolution. 5. The method of claim 1 , wherein the one or more depth images do not have a depth value for each pixel, and wherein the method further comprises: performing a densification procedure on the one or more depth images to assign a depth value to each pixel of the one or more depth images; wherein the densification procedure comprises applying a machine learning model trained, to densify depth image, using synthetic images of people, the synthetic images of people generated with specified depth data for each pixel. 6. The method of claim 1 , wherein generating the one or more masks comprises: identifying segments for the one or more color images and/or the one or more depth image, the segments comprising at least foreground and background distinctions; wherein the identifying the segments comprises applying a machine learning model trained, to segment images, using synthetic images of people, the synthetic images of people generated with specified labeled segments. 7. The method of claim 1 , wherein the one or more depth images do not have a depth value for each pixel, and wherein the method further comprises: performing a densification procedure on the one or more depth images to assign a depth value to each pixel of the one or more depth images and identify segments for the one or more color images and/or the one or more depth image, the segments comprising at least foreground and background distinctions; wherein the densification procedure comprises applying a machine learning model trained, to densify depth image and segment images, using synthetic images of people, the synthetic images of people generated with specified depth data for each pixel and labeled segments. 8. The method of claim 1 , wherein at least part of the compression is performed using an RVL compression algorithm. 9. The method of claim 1 , wherein the torso disocclusion on the 3D mesh includes: generating an existing model of a body of a sending user based on one or more previously captured images of the sending user; identifying one or more holes in the 3D mesh corresponding to occlusions between a depth sensor and the sending user; and filling in the one or more holes in the 3D mesh with corresponding portions from the existing model. 10. The method of claim 1 , wherein the facial reconstruction on the 3D mesh includes: encoding at least a facial portion, depicting a sending user wearing an XR headset, of at least one of the one or more color images; applying a geometry branch of a machine learning model to the encoded facial portion to produce a predicted geometry of the sending user without the XR headset; applying a texture branch of a machine learning model to the encoded facial portion to produce a predicted texture of the sending user without the XR headset; and skinning the predicted texture onto the predicted geometry. 11. The method of claim 1 , wherein the facial reconstruction on the 3D mesh includes: performing a pre-scan of a sending user, while not wearing an XR headset, to generate multiple expression meshes for the sending user; determining a head pose and facial expression for the sending user; selecting multiple of the expression meshes that match the determined facial expression and adjusting the selected multiple expression meshes to conform to the head pose; combining the multiple selected expression meshes; and applying depth and color blending on the combined expression mesh with at least the one or more color images. 12. A computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform a process for conducting a holographic call using a holographic call pipeline, the process comprising: capturing, at a sending device, color, depth, and audio data and using the color and depth data to generate one or more color images and one or more depths images; generating one or more masks for the one or more color images and one or more depth images; applying the one or more masks to the one or more color images and one or more depth images to obtain masked portions of the one or more color images and one or more depth images; compressing the masked portions of the one or more color images and one or more depth images; and transmitting, over a communication channel, the compressed portions and the audio data; wherein a receiving device, in response to the transmitting: decompresses the compressed portions; converts the portions of the one or more depth images into a 3D mesh; paints the portions of the one or more color images onto the 3D mesh; and outputs the painted 3D mesh as a hologram with synchronized audio. 13. The computer-readable storage medium of claim 12 , wherein capturing the depth data includes capturing structured light by emitting a pattern of infrared (IR) light, capturing reflections of the IR light, and determining depth data based on how the pattern of IR light is distorted and/or using time-of-flight readings for parts of the pattern of IR light. 14. The computer-readable storage medium of claim 12 , wherein the one or more depth images do not have a depth value for each pixel, and wherein the process further comprises: performing a densification procedure on the one or more depth images to assign a depth value to each pixel of the one or more depth images and identify segments for the one or more color images and/or the one or more depth image, the segments comprising at least foreground and background
Mixed reality (object pose determination, tracking or camera calibration for mixed reality G06T7/00) · CPC title
Augmented reality · CPC title
audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants (echo suppression in two-way loud-speaking telephone systems H04M9/02; sound field processing per se H04S7/30) · CPC title
Multimedia conference systems · CPC title
Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.