What technology area does this patent fall under?

Primary CPC classification G06V10/774. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 26 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Memory-optimized contrastive learning

US12400432B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12400432-B2
Application number	US-202217988655-A
Country	US
Kind code	B2
Filing date	Nov 16, 2022
Priority date	Nov 16, 2021
Publication date	Aug 26, 2025
Grant date	Aug 26, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using memory-optimized contrastive learning to train image encoder and text encoder neural networks.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers and for training an image encoder neural network having image encoder neural network parameters and configured to process an image to generate an image embedding of the image in an embedding space and a text encoder neural network having text encoder neural network parameters and configured to process a text segment to generate a text embedding of the text segment in the embedding space, the method comprising: obtaining a batch of training pairs, each training pair including an input image and an input text segment; obtaining data partitioning the batch of training pairs into a plurality of chunks of training pairs; for each chunk: performing, on a set of one or more computing devices, a first forward pass through the image encoder neural network in accordance with current values of the image encoder neural network parameters on the input images in the training pairs in the chunk to generate a respective image embedding of each input image; performing, on the set of one or more computing devices, a first forward pass through the text encoder neural network in accordance with current values of the text encoder neural network parameters on the input text segments in the training pairs in the chunk to generate a respective text embedding of each text segment; storing, in memory of the set of one or more computing devices, the respective image embeddings and the respective text embeddings without storing intermediate hidden states generated by performing the first forward passes through the image encoder neural network and the text encoder neural network; for each training pair in the batch and using the respective image embeddings and the respective text embeddings for the plurality of chunks stored in the memory of the set of one or more computing devices, generating a respective similarity between the image embedding of the input image in the training pair and the respective text embeddings of the input text segments in all of the training pairs in the batch; determining, for each training pair in the batch, a respective gradient with respect to the image embedding of the input image in the training pair of a contrastive loss function that is based on the respective similarities; for each chunk: performing, on the one or more computing devices, a second forward pass through the image encoder neural network in accordance with current values of the image encoder neural network parameters on the input images in the training pairs in the chunk to re-generate the intermediate hidden states of the image encoder neural network; performing a backward pass through the image encoder neural network using the respective gradients with respect to the image embeddings of the input images in the training pairs in the chunk and the re-generated intermediate hidden states to generate a respective chunked gradient of the contrastive loss function with respect to each of the image encoder neural network parameters; and updating the current values of the image encoder neural network parameters using the respective chunked gradients for the chunks. 2. The method of claim 1 , further comprising: determining, for each training pair in the batch, a respective gradient with respect to the text embedding of the input text segment in the training pair of the contrastive loss function that is based on the respective similarities; for each chunk: performing, on the one or more computing devices, a second forward pass through the text encoder neural network in accordance with current values of the text encoder neural network parameters on the input text segments in the training pairs in the chunk to re-generate the intermediate hidden states of the text encoder neural network; and performing a backward pass through the text encoder neural network using the respective gradients with respect to the text embeddings for the text segments in the training pairs in the chunk and the re-generated intermediate hidden states of the text encoder neural network to generate a respective chunked gradient of the contrastive loss function with respect to each of the text encoder neural network parameters; and updating the current values of the text encoder neural network parameters using the respective chunked gradients for the chunks. 3. The method of claim 1 , wherein updating the current values of the image encoder neural network parameters using the respective chunked gradients for the chunks comprises: applying an optimizer to the current values of the image encoder neural network parameters using the respective chunked gradients. 4. The method of claim 3 , wherein the optimizer maintains respective estimates of one or more gradient moments and determines an update to the current values of the image encoder neural network parameters using the respective estimates, and wherein applying the optimizer comprises: for each chunk and for each of the gradient moments, updating the respective estimate of the gradient moment using the respective chunked gradient for the chunk; and determining the update using the respective estimates of each of the gradient moments after the respective estimates have been updated using the respective chunked gradients for all of the chunks. 5. The method of claim 4 , further comprising, for each chunk and after updating, for each of the gradient moments, the respective estimate of the gradient moment using the respective chunked gradient for the chunk, discarding the respective chunked gradient for the chunk. 6. The method of claim 1 , further comprising: for each chunk, after performing a backward pass through the image encoder neural network using the respective gradients with respect to the image embeddings of the training pairs in the chunk and the re-generated intermediate hidden states, discarding the re-generated intermediate hidden states prior to performing a backward pass for any subsequent chunks. 7. The method of claim 1 , wherein generating a respective similarity between the image embedding of the input image in the training pair and the respective text embeddings of the input text segments in all of the training pairs in the batch comprises, for each particular training pair in the batch: computing a dot product between the image embedding of the input image in the training pair and the text embedding of the text embedding of the input text segment in the particular training pair. 8. The method of claim 1 , wherein determining, for each training pair in the batch, a respective gradient of a contrastive loss function that is based on the respective similarities with respect to the image embedding of the input image in the training pair comprises: determining a respective gradient of the contrastive loss function with respect to each respective similarity between any two input image-input text segment pairs in the batch; and determining the respective gradients of the contrastive loss function with respect to the image embeddings for the input images in the training pairs in the batch from the respective gradients of the contrastive loss function with respect to the respective similarities and the image embeddings. 9. The method of claim 1 , further comprising: prior to jointly training the image encoder neural network and the text encoder neural network, training an image classification model that includes the image encoder neural network on an image classification task, wherein the current values of the image encoder neural network parameters are determined based on values of the parameters after training the image classification model. 10. The method of claim 1 , further comprising: after training the image encoder neural n

Assignees

Google Llc

Inventors

Classifications

G06T9/002
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06V10/776
Validation; Performance evaluation · CPC title
G06V10/82
using neural networks · CPC title
G06F40/126
Character encoding · CPC title

Patent family

Related publications grouped by family.

View patent family 84981445

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12400432B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using memory-optimized contrastive learning to train image encoder and text encoder neural networks.
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06V10/774. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 26 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).