What technology area does this patent fall under?

Primary CPC classification G06V10/764. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Open-vocabulary object detection in images

US11928854B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11928854-B2
Application number	US-202318144045-A
Country	US
Kind code	B2
Filing date	May 5, 2023
Priority date	May 6, 2022
Publication date	Mar 12, 2024
Grant date	Mar 12, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings; processing each object embedding using a localization subnetwork to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork to generate, for each object embedding, a respective classification score distribution over the set of query embeddings.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers, the method comprising: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network to generate object detection data for the image, comprising: processing the image using an image encoding subnetwork of the object detection neural network to generate a set of object embeddings, wherein the image encoding subnetwork comprises one or more self-attention neural network layers; processing each object embedding using a localization subnetwork of the object detection neural network to generate localization data defining a corresponding region of the image; and processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork of the object detection neural network to generate, for each object embedding, a respective classification score distribution over the set of query embeddings, wherein the respective classification score distribution for each of the object embeddings defines, for each query embedding, a likelihood that the region of the image corresponding to the object embedding depicts an object that is included in the category represented by the query embedding. 2. The method of claim 1 , wherein for one or more of the query embeddings, obtaining the query embedding comprises: obtaining a text sequence that describes a category of object; and processing the text sequence using a text encoding subnetwork of the object detection neural network to generate the query embedding; wherein the image encoding subnetwork and the text encoding subnetwork are pre-trained, wherein the pre-training includes repeatedly performing operations comprising: obtaining: (i) a training image, (ii) a positive text sequence, wherein the positive text sequence characterizes the training image, and (iii) one or more negative text sequences, wherein the negative text sequences do not characterize the training image; generating an embedding of the training image using the image encodin subnetwork, comprising: processing the training image using the image encoding subnetwork to generate a set of object embeddings for the training image; and processing the object embeddings using an embedding neural network to generate the embedding of the training image; generating respective embeddings of the positive text sequence and each of the negative text sequences using the text encoding subnetwork; and jointly training the image encoding subnetwork and the text encoding subnetwork to encourage: (i) greater similarity between the embedding of the training image and the embedding of the positive text sequence, (ii) lesser similarity between the embedding of the training image and the embeddings of the negative text sequences, comprising: jointly training the image encoding subnetwork and the text encoding subnetwork to optimize an objective function that includes a contrastive loss term. 3. The method of claim 2 , wherein the embedding neural network is jointly trained along with the image encoding subnetwork and the text encoding subnetwork. 4. The method of claim 2 , wherein after the pre-training of the image encoding subnetwork and the text encoding subnetwork, the object detection neural network is trained to optimize an objective function that measures performance of the object detection neural network on a task of object detection in images. 5. The method of claim 4 , wherein the objective function that measures performance of the object detection neural network on the task of object detection in images comprises a bipartite matching loss term. 6. The method of claim 2 , wherein processing: (i) the set of object embeddings, and (ii) the set of query embeddings, using a classification subnetwork of the object detection neural network to generate, for each object embedding, a respective classification score distribution over the set of query embeddings, comprises: processing each object embedding using one or more neural network layers of the classification neural network to generate a corresponding classification embedding; and generating, for each object embedding, the classification score distribution over the set of query embeddings using: (i) the classification embedding corresponding to the object embedding, and (ii) the query embeddings, comprising: generating a respective measure of similarity between the classification embedding and each query embedding, wherein the measure of similarity between the classification embedding and a query embedding defines a likelihood that the region of the image corresponding to the object embedding depicts an object that is included in the category represented by the query embedding. 7. The method of claim 6 , wherein processing each object embedding using one or more neural network layers of the classification neural network to generate a corresponding classification embedding comprises: generating each classification embedding by projecting the corresponding object embedding into a latent space that includes the query embeddings. 8. The method of claim 6 , wherein generating the respective measure of similarity between the classification embedding and each query embedding comprises, for each query embedding: computing an inner product between the classification embedding and the query embedding. 9. The method of claim 2 , wherein for each object embedding, processing the object embedding using the localization subnetwork to generate localization data defining the corresponding region of the image comprises: processing the object embedding using the localization subnetwork to generate localization data defining a bounding box in the image. 10. The method of claim 1 , wherein processing the image using the image encoding subnetwork to generate the set of object embeddings comprises: generating a set of initial object embeddings by an embedding layer of the image encoding subnetwork, wherein each initial object embedding is derived at least in part from a corresponding patch in the image; and processing the set of initial object embeddings by a plurality of neural network layers, including the one or more self-attention neural network layers of the image encoding subnetwork, to generate a set of final object embeddings. 11. The method of claim 10 , wherein processing an object embedding using the localization subnetwork to generate localization data defining the corresponding region of the image comprises: generating a set of offset coordinates, wherein the offset coordinates define an offset of the corresponding region of the image from a location of the image patch corresponding to the object embedding. 12. The method of claim 2 , wherein the text encoding subnetwork comprises one or more self-attention neural network layers. 13. The method of claim 2 , further comprising, for one or more of the object embeddings: determining that the region of the image corresponding to the object embedding depicts an object that is included in the category represented by a query embedding based on the classification score distribution for the object embedding. 14. A system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining: (i) an image, and (ii) a set of one

Assignees

Google Llc

Inventors

Classifications

G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06N3/04
Architecture, e.g. interconnection topology · CPC title
G06V10/86
using syntactic or structural representations of the image or video pattern, e.g. symbolic string recognition; using graph matching · CPC title
G06V10/776
Validation; Performance evaluation · CPC title
G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

View patent family 86692681

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11928854B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for object detection. In one aspect, a method comprises: obtaining: (i) an image, and (ii) a set of one or more query embeddings, wherein each query embedding represents a respective category of object; processing the image and the set of query embeddings using an object detection neural network t…
Who is the assignee on this patent?: Google Llc
What technology area does this patent fall under?: Primary CPC classification G06V10/764. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 12 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Pretraining framework for neural networks

Systems and methods for image modification and image based content capture and extraction in neural networks

Bilstm-siamese network based classifier for identifying target class of queries and providing responses thereof

Frequently asked questions