What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Unsupervised pre-training of geometric vision models

Patent metadata
Field	Value
Publication number	US-12499678-B2
Application number	US-202318230414-A
Country	US
Kind code	B2
Filing date	Aug 4, 2023
Priority date	Oct 11, 2022
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method includes: performing unsupervised pre-training of a model, the model including and a decoder including: obtaining a first image and a second image under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of the first image into a transformed representation; decoding, by the decoder, the transformed representation into a reconstructed image, where the transforming of the representation of the first image and the decoding of the transformed representation is based on the representation of the first image and the representation of the second image; and adjusting one or more parameters of at least one of the encoder and the decoder based on minimizing a loss; and fine-tuning the model, initialized with a set of task specific encoder parameters, for a geometric vision task.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented machine learning method of training a task specific machine learning model for a downstream geometric vision task, the method comprising: performing unsupervised pre-training of a machine learning model, the machine learning model comprising an encoder having a set of encoder parameters and a decoder having a set of decoder parameters, wherein the performing of the unsupervised pre-training of the machine learning model includes: obtaining a pair of unannotated images including a first image and a second image, wherein the first and second images depict a same scene and are taken under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of the first image into a transformed representation; decoding, by the decoder, the transformed representation into a reconstructed image, wherein the transforming of the representation of the first image and the decoding of the transformed representation is based on the representation of the first image and the representation of the second image; and adjusting one or more parameters of at least one of the encoder and the decoder based on minimizing a loss; constructing the task specific machine learning model for the downstream geometric vision task based on the pre-trained machine learning model, the task specific machine learning model comprising a task specific encoder having a set of task specific encoder parameters; initializing the set of task specific encoder parameters with the set of encoder parameters of the pre-trained machine learning model; and fine-tuning the task specific machine learning model, initialized with the set of task specific encoder parameters, for the downstream geometric vision task. 2 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the transforming of the representation of the first image includes applying a transformation to the representation of the first image to generate the transformed representation, the transformation being determined based on the representation of the first image and the representation of the second image such that the transformed representation approximates the representation of the second image. 3 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the loss is based on a metric quantifying a difference between the reconstructed image and the second image. 4 . The method of claim 1 , wherein the unsupervised pre-training of the machine learning model is a cross-view alignment pre-training, and wherein the loss is based on a metric quantifying a difference between the transformed representation and the representation of the second image. 5 . The method of claim 2 , wherein the representation of the first image is a first set of n vectors {x 1,i } i=1 . . . n , each x 1,i ∈ K , wherein the representation of the second image is a second set of n vectors {x 2,i } i=1 . . . n , each x 2,i ∈ K , wherein the applying of the transformation includes decomposing each vector of the first and second sets of vectors in a D-dimensional equivariant part and a (K−D)-dimensional invariant part and applying a (D×D)-dimensional transformation matrix Ω to the equivariant part of each vector of the first set of vectors, wherein 0<D≤K. 6 . The method of claim 5 , wherein the transformation is a D-dimensional rotation and Ω is a D-dimensional rotation matrix, and wherein Ω is set based on aligning the equivariant parts of the vectors of the first set of vectors with the equivariant parts of the respective vectors of the second set of vectors. 7 . The method of claim 5 , further comprising determining Ω based on the equation: Ω = arg min Ω ^ ∈ SO ⁡ ( D ) ∑ i = 1 n  Ω ˆ ⁢ x 1 , i e ⁢ q ⁢ u ⁢ i ⁢ v - x 2 , i e ⁢ q ⁢ u ⁢ i ⁢ v  2 where x 1,i equiv denotes the equivariant part of vector x 1,i , x 2,i equiv denotes the equivariant part of vector x 2,i , and SO(D) denotes the D-dimensional rotation group. 8 . The method of claim 1 , wherein the unsupervised pre-training is a cross-view completion pre-training, and wherein the performing of the cross-view completion pre-training of the machine learning model further comprises: splitting the first image into a first set of non-overlapping patches and splitting the second image into a second set of non-overlapping patches; and masking ones of the patches of the first set of patches, wherein the encoding of the first image into the representation of the first image includes, encoding, by the encoder, each unmasked patch of the first set of patches into a corresponding representation of the respective unmasked patch, thereby generating a first set of patch representations, wherein the encoding the second image into the representation of the second image includes, encoding, by the encoder, each patch of the second set of patches into a corresponding representation of the respective patch, thereby generating a second set of patch representations, wherein the decoding of the transformed representation includes, generating, by the decoder, for each masked patch of the first set of patches, a predicted recons

Assignees

Naver Corp

Inventors

Classifications

G06T2207/30196
Human being; Person · CPC title
G06T2207/20081
Training; Learning · CPC title
G06N3/088
Non-supervised learning, e.g. competitive learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06V10/776
Validation; Performance evaluation · CPC title

Patent family

Related publications grouped by family.

View patent family 84043966

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499678B2 cover?: A method includes: performing unsupervised pre-training of a model, the model including and a decoder including: obtaining a first image and a second image under different conditions or from different viewpoints; encoding, by the encoder, the first image into a representation of the first image and the second image into a representation of the second image; transforming the representation of th…
Who is the assignee on this patent?: Naver Corp
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-camera face swapping

Temporal Action Localization with Mutual Task Guidance

Computer Vision Systems and Methods for Machine Learning Using Image Hallucinations

Computer Vision Systems and Methods for Unsupervised Representation Learning by Sorting Sequences

Frequently asked questions