What technology area does this patent fall under?

Primary CPC classification G06T7/50. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Three-dimensional location prediction from images

US12299916B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12299916-B2
Application number	US-202117545987-A
Country	US
Kind code	B2
Filing date	Dec 8, 2021
Priority date	Dec 8, 2020
Publication date	May 13, 2025
Grant date	May 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicting three-dimensional object locations from images. One of the methods includes obtaining a sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the sequence, respective pseudo-lidar features of a respective pseudo-lidar representation of a region in the image that has been determined to depict a first object; generating, for a particular image at a particular time step in the sequence, image patch features of the region in the particular image that has been determined to depict the first object; and generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes a location of the first object in a three-dimensional coordinate system at the particular time step in the sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers, the method comprising: obtaining a temporal sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the temporal sequence, respective pseudo-lidar features of a respective pseudo-lidar representation of a region in the image that has been determined to depict a first object by processing the region in the image using a first neural network, wherein the pseudo-lidar features represent one or more pixels within the region in the image as a point in a three-dimensional coordinate system based on an initial depth estimate for the image; generating, for a particular image at a particular time step in the temporal sequence, image patch features of the region in the particular image that has been determined to depict the first object by processing the region in the particular image using a second neural network, wherein the image patch features are generated from intensity values of pixels in the image; and generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes a location of the first object in the three-dimensional coordinate system at the particular time step in the temporal sequence by processing the respective pseudo-lidar features and the image patch features using a third neural network, wherein generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes the first object at the particular time step in the temporal sequence comprises: combining the respective pseudo-lidar features that represent one or more pixels within the region in the image as a point in the three-dimensional coordinate system based on the initial depth estimate for the image and the image patch features to generate combined features; and processing the combined features using the third neural network to generate the prediction. 2. The method of claim 1 , wherein the prediction includes an updated depth estimate that estimates a depth of a specified point on the first object at the particular time step in the temporal sequence, wherein the updated depth estimate is a predicted distance from the specified point on the first object to the camera at the particular time step. 3. The method of claim 1 , wherein the prediction specifies a three-dimensional region that corresponds to a predicted location of the first object at the particular time step relative to the camera. 4. The method of claim 1 , wherein the third neural network is a decoder neural network. 5. The method of claim 4 , wherein combining the respective pseudo-lidar features and the image patch features comprises concatenating the respective pseudo-lidar features and the image patch features. 6. The method of claim 1 , wherein generating image patch features of the region in the image at the particular time step in the temporal sequence comprises: processing the image using an image feature extraction neural network to generate image features for the image; and selecting, as the image patch features, a subset of the image features that correspond to the region in the image. 7. The method of claim 1 , further comprising: generating, for each image in the temporal sequence, an initial depth estimate that assigns a respective estimated depth value to each pixel in the image; and generating, for each image in the temporal sequence, the respective pseudo-lidar representation using the initial depth estimate for the image. 8. The method of claim 7 , wherein generating, for each image in the temporal sequence, an initial depth estimate that assigns a respective estimated depth value to each pixel in the image comprises: processing the image using a depth estimation neural network to generate the initial depth estimate for the image. 9. The method of claim 8 , wherein generating the pseudo-lidar representation comprises: mapping each pixel that is within the region in the image that has been determined to depict the first object to the three-dimensional coordinate system based on the estimated depth value for the pixel in the initial depth estimate for the image and properties of the camera. 10. The method of claim 9 , wherein the properties of the camera include the horizontal and vertical focal lengths of the camera. 11. The method of claim 1 , wherein generating respective pseudo-lidar features of each of the pseudo-lidar representations comprises: processing the pseudo-lidar representation using a pseudo-lidar feature extraction neural network to generate the pseudo-lidar features for the pseudo-lidar representation. 12. A method performed by one or more computers, the method comprising: obtaining a temporal sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the temporal sequence, an initial depth estimate that assigns a respective estimated depth value to each pixel in the image; obtaining object tracklet data for a first object that identifies, for each of the images in the temporal sequence, a respective two-dimensional bounding box in the image that has been determined to depict the first object; generating, for each image in the temporal sequence, a respective pseudo-lidar representation of the two-dimensional bounding box in the image from the initial depth estimate for the image; generating respective pseudo-lidar features of each of the pseudo-lidar representations by processing the pseudo-lidar representation using a first neural network, wherein the pseudo-lidar features represent one or more pixels within the region in the image as a point in the three-dimensional coordinate system based on the initial depth estimate for the image; generating image patch features of the two-dimensional bounding box in the last image in the temporal sequence by processing the two-dimensional bounding box in the last image using a second neural network, wherein the image patch features are generated from intensity values of pixels in the image; and generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes a location of the first object in the three-dimensional coordinate system at the last time step in the temporal sequence by processing the respective pseudo-lidar features and the image patch features using a third neural network, wherein generating, from the respective pseudo-lidar features and the image patch features, a prediction that characterizes the first object at the particular time step in the temporal sequence comprises: combining the respective pseudo-lidar features that represent one or more pixels within the region in the image as a point in a three-dimensional coordinate system based on an initial depth estimate for the image and the image patch features to generate combined features; and processing the combined features using the third neural network to generate the prediction. 13. A system comprising one or more computers and one or more storage devices storing instructions then when executed by the one or more computers cause the one or more computers to perform operations comprising: obtaining a temporal sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the temporal sequence, respective pseudo-lidar features of a respective pseudo-lidar representation of a region in the image that has been determined to depict a

Assignees

Waymo Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06V10/40
Extraction of image or video features · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/10016
Video; Image sequence · CPC title

Patent family

Related publications grouped by family.

View patent family 81848144

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12299916B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for predicting three-dimensional object locations from images. One of the methods includes obtaining a sequence of images that comprises, at each of a plurality of time steps, a respective image that was captured by a camera at the time step; generating, for each image in the sequence, respective pse…
Who is the assignee on this patent?: Waymo Llc
What technology area does this patent fall under?: Primary CPC classification G06T7/50. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Radar deep learning

Radar and camera data fusion

Systems and methods for semi-supervised depth estimation according to an arbitrary camera

Supplementing top-down predictions with image features

Radar deep learning

Video visual relation detection methods and systems

High accuracy monocular moving object localization

Frequently asked questions