Multi-camera face swapping
US-2024078726-A1 · Mar 7, 2024 · US
US12499678B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499678-B2 |
| Application number | US-202318230414-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 4, 2023 |
| Priority date | Oct 11, 2022 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method includes: performing unsupervised pre-training of a model, the model including and a decoder including: obtaining a first image and a second image under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of the first image into a transformed representation; decoding, by the decoder, the transformed representation into a reconstructed image, where the transforming of the representation of the first image and the decoding of the transformed representation is based on the representation of the first image and the representation of the second image; and adjusting one or more parameters of at least one of the encoder and the decoder based on minimizing a loss; and fine-tuning the model, initialized with a set of task specific encoder parameters, for a geometric vision task.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented machine learning method of training a task specific machine learning model for a downstream geometric vision task, the method comprising: performing unsupervised pre-training of a machine learning model, the machine learning model comprising an encoder having a set of encoder parameters and a decoder having a set of decoder parameters, wherein the performing of the unsupervised pre-training of the machine learning model includes: obtaining a pair of unannotated images including a first image and a second image, wherein the first and second images depict a same scene and are taken under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of the first image into a transformed representation; decoding, by the decoder, the transformed representation into a reconstructed image, wherein the transforming of the representation of the first image and the decoding of the transformed representation is based on the representation of the first image and the representation of the second image; and adjusting one or more parameters of at least one of the encoder and the decoder based on minimizing a loss; constructing the task specific machine learning model for the downstream geometric vision task based on the pre-trained machine learning model, the task specific machine learning model comprising a task specific encoder having a set of task specific encoder parameters; initializing the set of task specific encoder parameters with the set of encoder parameters of the pre-trained machine learning model; and fine-tuning the task specific machine learning model, initialized with the set of task specific encoder parameters, for the downstream geometric vision task. 2 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the transforming of the representation of the first image includes applying a transformation to the representation of the first image to generate the transformed representation, the transformation being determined based on the representation of the first image and the representation of the second image such that the transformed representation approximates the representation of the second image. 3 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the loss is based on a metric quantifying a difference between the reconstructed image and the second image. 4 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the loss is based on a metric quantifying a difference between the transformed representation and the representation of the second image. 5 . The method of claim 2 , wherein the representation of the first image is a first set of n vectors {x 1,i } i=1 . . . n , each x 1,i ∈ K , wherein the representation of the second image is a second set of n vectors {x 2,i } i=1 . . . n , each x 2,i ∈ K , wherein the applying of the transformation includes decomposing each vector of the first and second sets of vectors in a D-dimensional equivariant part and a (K−D)-dimensional invariant part and applying a (D×D)-dimensional transformation matrix Ω to the equivariant part of each vector of the first set of vectors, wherein 0<D≤K. 6 . The method of claim 5 , wherein the transformation is a D-dimensional rotation and Ω is a D-dimensional rotation matrix, and wherein Ω is set based on aligning the equivariant parts of the vectors of the first set of vectors with the equivariant parts of the respective vectors of the second set of vectors. 7 . The method of claim 5 , further comprising determining Ω based on the equation: Ω = arg min Ω ^ ∈ SO ( D ) ∑ i = 1 n Ω ˆ x 1 , i e q u i v - x 2 , i e q u i v 2 where x 1,i equiv denotes the equivariant part of vector x 1,i , x 2,i equiv denotes the equivariant part of vector x 2,i , and SO(D) denotes the D-dimensional rotation group. 8 . The method of claim 1 , wherein the unsupervised pre-training is a cross-view completion pre-training, and wherein the performing of the cross-view completion pre-training of the machine learning model further comprises: splitting the first image into a first set of non-overlapping patches and splitting the second image into a second set of non-overlapping patches; and masking ones of the patches of the first set of patches, wherein the encoding of the first image into the representation of the first image includes, encoding, by the encoder, each unmasked patch of the first set of patches into a corresponding representation of the respective unmasked patch, thereby generating a first set of patch representations, wherein the encoding the second image into the representation of the second image includes, encoding, by the encoder, each patch of the second set of patches into a corresponding representation of the respective patch, thereby generating a second set of patch representations, wherein the decoding of the transformed representation includes, generating, by the decoder, for each masked patch of the first set of patches, a predicted recons
Human being; Person · CPC title
Training; Learning · CPC title
Non-supervised learning, e.g. competitive learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Validation; Performance evaluation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.