Method and apparatus of encoding/decoding point cloud geometry data sensed by at least one sensor
US-2024404116-A1 · Dec 5, 2024 · US
US2022156981A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022156981-A1 |
| Application number | US-202117224103-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 6, 2021 |
| Priority date | Nov 17, 2020 |
| Publication date | May 19, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a first device may receive, from a second device, a reference landmark map identifying locations of facial features of a user of the second device depicted in a reference image and a feature map, generated based on the reference image, representing an identity of the user. The first device may receive, from the second device, a current compressed landmark map based on a current image of the user and decompress the current compressed landmark map to generate a current landmark map. The first device may update the feature map based on a motion field generated using the reference landmark map and the current landmark map. The first device may generate scaling factors based on a normalization facial mask of pre-determined facial features of the user. The first device may generate an output image of the user by decoding the updated feature map using the scaling factors.
Opening claim text (preview).
What is claimed is: 1 . A method comprising, by a first device: receiving, from a second device: a reference landmark map identifying locations of facial features of a user of the second device depicted in a reference image, and a feature map, generated based on the reference image, representing an identity of the user; receiving, from the second device, a current compressed landmark map based on a current image of the user; decompressing the current compressed landmark map to generate a current landmark map; updating the feature map based on a motion field generated using the reference landmark map and the current landmark map; generating scaling factors based on a normalization facial mask of pre-determined facial features of the user; and generating an output image of the user by decoding the updated feature map using the scaling factors. 2 . The method of claim 1 , further comprising: receiving, from the second device, a subsequent compressed landmark map based on a subsequent image of the user; decompressing the subsequent compressed landmark map to generate a subsequent landmark map; updating the feature map based on the motion field generated using the reference landmark map and the subsequent landmark map; and generating another output image of the user by decoding the updated feature map using the scaling factors. 3 . The method of claim 1 , further comprising: using, by applying a dense motion machine-learning model, a downsampled reference image, the reference landmark map, and the current landmark map to generate the motion field and an occlusion map. 4 . The method of claim 3 , further comprising: updating, using the feature map updated based on the motion field, the feature map based on the occlusion map, wherein the updated feature map is multiplied element-wise with the occlusion map to the updated feature map. 5 . The method of claim 3 , wherein the occlusion map indicates one or more portions of a face of a user that is occluded in the current frame. 6 . The method of claim 1 , wherein the reference landmark map indicates a plurality of reference locations of N landmarks for the reference image of the user, and wherein the current landmark map indicates a plurality of current locations of N landmarks for the current image of the user. 7 . The method of claim 6 , wherein the motion field indicates a change in location from the plurality of reference locations of N landmarks to the plurality of current locations of N landmarks. 8 . The method of claim 1 , wherein the normalization facial mask is generated based on a set of landmarks. 9 . The method of claim 1 , wherein decoding the updated feature map comprises: applying a first set of scaling factors to the updated feature map to generate a first layer of the output image; applying a second set of scaling factors to the first layer of the output image to generate a second layer of the output image; and applying a subsequent set of scaling factors to the second layer of the output image to generate the output image. 10 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive, from a second device: a reference landmark map identifying locations of facial features of a user of the second device depicted in a reference image, and a feature map, generated based on the reference image, representing an identity of the user; receive, from the second device, a current compressed landmark map based on a current image of the user; decompress the current compressed landmark map to generate a current landmark map; update the feature map based on a motion field generated using the reference landmark map and the current landmark map; generate scaling factors based on a normalization facial mask of pre-determined facial features of the user; and generate an output image of the user by decoding the updated feature map using the scaling factors. 11 . The media of claim 10 , wherein the software is further operable when executed to: use, by applying a dense motion machine-learning model, a downsampled reference image, the reference landmark map, and the current landmark map to generate the motion field and an occlusion map. 12 . The media of claim 11 , wherein the software is further operable when executed to: update, using the feature map updated based on the motion field, the feature map based on the occlusion map, wherein the updated feature map is multiplied element-wise with the occlusion map to the updated feature map. 13 . The media of claim 11 , wherein the occlusion map indicates one or more portions of a face of a user that is occluded in the current frame. 14 . The media of claim 10 , wherein the software is further operable when executed to: apply a first set of scaling factors to the updated feature map to generate a first layer of the output image; apply a second set of scaling factors to the first layer of the output image to generate a second layer of the output image; and apply a subsequent set of scaling factors to the second layer of the output image to generate the output image. 15 . A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: receive, from a second device: a reference landmark map identifying locations of facial features of a user of the second device depicted in a reference image, and a feature map, generated based on the reference image, representing an identity of the user; receive, from the second device, a current compressed landmark map based on a current image of the user; decompress the current compressed landmark map to generate a current landmark map; update the feature map based on a motion field generated using the reference landmark map and the current landmark map; generate scaling factors based on a normalization facial mask of pre-determined facial features of the user; and generate an output image of the user by decoding the updated feature map using the scaling factors. 16 . The system of claim 15 , wherein the processors are further operable when executing the instructions to: use, by applying a dense motion machine-learning model, a downsampled reference image, the reference landmark map, and the current landmark map to generate the motion field and an occlusion map. 17 . The system of claim 16 , wherein the processors are further operable when executing the instructions to: update, using the feature map updated based on the motion field, the feature map based on the occlusion map, wherein the updated feature map is multiplied element-wise with the occlusion map to the updated feature map. 18 . The system of claim 16 , wherein the occlusion map indicates one or more portions of a face of a user that is occluded in the current frame. 19 . The system of claim 15 , wherein the reference landmark map indicates a plurality of reference locations of N landmarks for the reference image of the user, and wherein the current landmark map indicates a plurality of current locations of N landmarks for the current image of the user. 20 . The system of claim 15 , wherein the processors are further operable when executing the instructions to: apply a first set of scaling factors to the updated feature map to generate a first layer of the output image; apply a second set of scaling factors to the first
Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals (selecting H04Q) · CPC title
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title
Model-based coding, e.g. wire frame · CPC title
involving reference images or patches · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.