Decoding with signaling of segmentation information
US-12506891-B2 · Dec 23, 2025 · US
US2024251103A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024251103-A1 |
| Application number | US-202418413163-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 16, 2024 |
| Priority date | Jan 25, 2023 |
| Publication date | Jul 25, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method of machine-learning. The method includes obtaining a training dataset of 3D models of real-world objects. The method further includes learning, based on the training dataset and on a patch-decomposition of the 3D models of the training dataset, a finite codebook of quantized vectors and a neural network. The neural network comprises a rotation-invariant encoder. The rotation-invariant encoder is configured for rotation-invariant encoding of a patch of a 3D model into a quantized latent vector of the codebook. The neural network further includes a decoder. The decoder is configured for decoding a sequence of quantized latent vectors of the codebook into a 3D model. The sequence corresponds to a patch-decomposition. This constitutes an improved solution for 3D model generation.
Opening claim text (preview).
1 . A computer-implemented method of machine-learning, the method comprising: obtaining a training dataset of 3D models of real-world objects; and learning, based on the training dataset and on a patch-decomposition of the 3D models of the training dataset, a finite codebook of quantized vectors and a neural network, the neural network including: a rotation-invariant encoder configured for rotation-invariant encoding of a patch of a 3D model into a quantized latent vector of the codebook, and a decoder configured for decoding a sequence of quantized latent vectors of the codebook, the sequence corresponding to a patch-decomposition, into a 3D model. 2 . The computer-implemented method of claim 1 , wherein the encoder is translation-invariant and rotation-invariant and is configured for translation-invariant and rotation-invariant encoding of a patch of a 3D model of the training dataset into a quantized latent vector of the codebook. 3 . The computer-implemented method of claim 1 , wherein the decoder includes: a first module configured for taking as input a sequence of quantized latent vectors of the codebook corresponding to a patch-decomposition and inferring patches rotations for reconstructing a 3D model; and a second module configured for taking as input the sequence of quantized latent vectors of the codebook corresponding to the patch-decomposition and inferring patches geometries for reconstructing a 3D model. 4 . The computer-implemented method of claim 1 , wherein the learning further comprises minimizing a loss, the loss including a reconstruction loss and a commitment loss, the commitment loss rewarding consistency between quantized latent vectors outputted by the encoder and vectors of the codebook. 5 . The computer-implemented method of claim 4 , wherein the loss is of a type: ℒ ( x ; ϕ , ψ , D ) = ℒ r ( o x , σ x ) + βℒ V Q ( Z , Z q ) where r is a reconstruction binary cross-entropy loss, VQ is a commitment loss, x represents a 3D point, ψ represents a parameter of the decoder, β is a weighting parameter, represents a ground truth occupancy for x, o x represents a predicted occupancy for x, ϕ represents the parameter of the encoder, and where ℒ V Q ( Z , Z q ) = Z - s g [ Z q ] 2 2 where sg[.] denotes a stop-gradient operation, Z={z i } i , Z q ={z i q } i , where z i is a non-quantized encoding of patch X i and where V Q ( z i ) = z i q = arg min e ∈ D z i - e where D={e k ∈R D }; k=1 . . . K is the codebook. 6 . A computer-implemented method of applying a decoder and a codebook learnable according to a machine-learning including obtaining a training dataset of 3D models of real-world objects and learning, based on the training dataset and on a patch-decomposition of the 3D models of the training dataset, a finite codebook of quantized vectors and a neural network, the neural network including: a rotation-invariant encoder configured for rotation-invariant encoding of a patch of a 3D model into a quantized latent vector of the codebook, and a decoder configured for decoding a sequence of quantized latent vectors of the codebook, the sequence corresponding to a patch-decomposition, into a 3D model, the method comprising: obtaining a sequence of quantized latent vectors of the codebook; and applying the decoder to the sequence. 7 . The computer-implemented method of claim 6 , wherein obtaining the sequence further comprises: applying a transformer neural network to obtain the sequence, the transformer neural network being configured for, given an input latent vector representing an input 3D model, generating a sequence of quantized latent vectors of the codebook that correspond to a patch-decomposition of the input 3D model. 8 . The computer-implemented method of claim 7 , wherein the latent vector representing the input 3D model corresponds to an embedding of an image or point cloud representing the input 3D model. 9 . The computer-implemented method of claim 8 , wherein the image is a single-view image or the point cloud is a partial point cloud. 10 . The computer-implemented method of claim 6 , wherein obtaining the sequence further comprises applying the encoder to a patch-decomposition of an input 3D model. 11 . The computer-implemented method of claim 6 , further comprising
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Three-dimensional [3D] modelling for computer graphics · CPC title
Vector quantisation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.