What technology area does this patent fall under?

Primary CPC classification G06T7/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-modal 3-D pose estimation

US11967103B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11967103-B2
Application number	US-202117505900-A
Country	US
Kind code	B2
Filing date	Oct 20, 2021
Priority date	Nov 16, 2020
Publication date	Apr 23, 2024
Grant date	Apr 23, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating a 3-D pose of an object of interest from image and point cloud data. In one aspect, a method includes obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud; and processing the fused representation using a pose estimation neural network and in accordance with current values of a plurality of pose estimation network parameters to generate a pose estimation network output that specifies, for each of multiple keypoints, a respective estimated position in the three-dimensional region of the environment.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud, comprising: processing the image to generate, for each of a plurality of keypoints, a respective score of each of a plurality of locations in the image, wherein the respective score represents a likelihood that the keypoint is located at the location in the image; determining, for each of a plurality of data points in the point cloud, a corresponding location in the image that corresponds to the data point; generating, for each of the plurality of data points in the point cloud, a respective feature vector, wherein the respective feature vector for the data point includes, for each of the plurality of keypoints, the respective score of the corresponding location in the image that corresponds to the data point; and generating the fused representation from the respective feature vectors; and processing the fused representation using a pose estimation neural network to generate a pose estimation network output that specifies, for each of the plurality of keypoints, a respective estimated position in the three-dimensional region of the environment. 2. The method of claim 1 , wherein determining, for each of the plurality of data points in the point cloud, the corresponding location in the image that corresponds to the data point comprises projecting each of the plurality of data points in the point cloud onto an image coordinate frame. 3. The method of claim 1 , wherein specifying, for each of the plurality of keypoints, a respective estimated position in the three-dimensional region of the environment comprises generating a three-dimensional location for each of the plurality of keypoints with reference to a point cloud coordinate frame. 4. The method of claim 1 , wherein: the environment is an environment in a vicinity of a vehicle; and the respective portions of the three-dimensional region in the environment assigned to the plurality of keypoints collectively define a respective estimated pose of one or more pedestrians in the environment. 5. The method of claim 1 , wherein generating the fused representation from the respective feature vectors comprises, for each of a plurality of data points in the point cloud: computing a concatenation of the respective feature vector and a vector specifying the three-dimensional location of the data point. 6. The method of claim 1 , further comprising training the pose estimation neural network, the training comprising: obtaining first training data comprising a labeled image associated with a label that specifies a plurality of ground truth keypoint locations in the image; obtaining second training data comprising an unlabeled point cloud; generating, for the unlabeled point cloud and based on the plurality of ground truth keypoint locations in the labeled image, a first pseudo label that specifies a plurality of pseudo keypoint locations in the unlabeled point cloud; and training the pose estimation neural network by using the first pseudo label as target pose estimation network output. 7. The method of claim 6 , wherein the training comprises, for the unlabeled point cloud: processing, using the pose estimation neural network, the fused representation to generate a training pose estimation network output that assigns a plurality of predicted keypoints to the unlabeled point cloud; computing a first loss based on a difference between the target pose estimation network output and the training pose estimation network output; determining, based on computing a gradient of the first loss with respect to pose estimation network parameters of the pose estimation neural network, an update to current values of the pose estimation network parameters. 8. The method of claim 7 , wherein the first loss is a Huber loss or a mean squared error loss. 9. The method of claim 6 , wherein the pose estimation neural network comprises: a first sub network configured to receive the fused representation and to process the fused representation to generate an intermediate representation of the fused representation; and a second sub network configured to receive the intermediate representation and to process the intermediate representation to generate the pose estimation network output. 10. The method of claim 9 , further comprising: generating, for the point cloud and based on the plurality of ground truth keypoint locations in the image, a second pseudo label that specifies a pseudo classification for each of the plurality of data points in the point cloud, wherein the pseudo classification specifies whether a corresponding data point should be classified as a keypoint or not; and training the first sub network and a data point classification neural network by using the pseudo classifications as target data point classification network outputs, the data point classification neural network having a plurality of data point classification network parameters and configured receive the intermediate representation generated by the first sub network of the pose estimation neural network and to process the intermediate representation to generate a data point classification network output that specifies a predicted classification for each of the plurality of data points in the point cloud. 11. The method of claim 9 , wherein the training comprises: processing, using the data point classification neural network and in accordance with current values of the data point classification network parameters, the intermediate representation to generate the training data point classification network output; computing a second loss based on a difference between the target data point classification network output and the training data point classification network output; determining, based on computing a gradient of the second loss with respect to the data point classification network parameters and backpropagating the gradient of the second loss through the data point classification network parameters into network parameters of the first sub network, an update to the current values of the first sub network parameters. 12. The method of claim 11 , further comprising determining an update to the current values of the data point classification network parameters. 13. The method of claim 11 , wherein the second loss is a sigmoid loss. 14. The method of claim 6 , wherein generating the first pseudo label that specifies a plurality of pseudo keypoint locations in the point cloud comprises: projecting the plurality of data points in the point cloud onto the image coordinate frame; identifying a subset of the plurality of projected data points that are within a predetermined distance of each of one or more of the plurality of ground truth keypoint locations in the image. 15. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud, comprising: processing the image to generate, for each of a plurality of keypoints, a respective score of each of a plurality of locations in the image, wherein the respective score represents a likelihood that the keypoint is located at the location in the image; determining, for each of a plurality of data points i

Assignees

Waymo Llc

Inventors

Classifications

G06V10/757
Matching configurations of points or features · CPC title
G06V40/103
Static body considered as a whole, e.g. static pedestrian or occupant recognition · CPC title
G06V10/806
of extracted features · CPC title
G06V10/7753
Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title
G06T7/73Primary
using feature-based methods · CPC title

Patent family

Related publications grouped by family.

View patent family 81586825

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11967103B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating a 3-D pose of an object of interest from image and point cloud data. In one aspect, a method includes obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud; a…
Who is the assignee on this patent?: Waymo Llc
What technology area does this patent fall under?: Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 23 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).