Object re-identification using pose part based models
US-2022343639-A1 · Oct 27, 2022 · US
US11967103B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11967103-B2 |
| Application number | US-202117505900-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 20, 2021 |
| Priority date | Nov 16, 2020 |
| Publication date | Apr 23, 2024 |
| Grant date | Apr 23, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating a 3-D pose of an object of interest from image and point cloud data. In one aspect, a method includes obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud; and processing the fused representation using a pose estimation neural network and in accordance with current values of a plurality of pose estimation network parameters to generate a pose estimation network output that specifies, for each of multiple keypoints, a respective estimated position in the three-dimensional region of the environment.
Opening claim text (preview).
What is claimed is: 1. A method comprising: obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud, comprising: processing the image to generate, for each of a plurality of keypoints, a respective score of each of a plurality of locations in the image, wherein the respective score represents a likelihood that the keypoint is located at the location in the image; determining, for each of a plurality of data points in the point cloud, a corresponding location in the image that corresponds to the data point; generating, for each of the plurality of data points in the point cloud, a respective feature vector, wherein the respective feature vector for the data point includes, for each of the plurality of keypoints, the respective score of the corresponding location in the image that corresponds to the data point; and generating the fused representation from the respective feature vectors; and processing the fused representation using a pose estimation neural network to generate a pose estimation network output that specifies, for each of the plurality of keypoints, a respective estimated position in the three-dimensional region of the environment. 2. The method of claim 1 , wherein determining, for each of the plurality of data points in the point cloud, the corresponding location in the image that corresponds to the data point comprises projecting each of the plurality of data points in the point cloud onto an image coordinate frame. 3. The method of claim 1 , wherein specifying, for each of the plurality of keypoints, a respective estimated position in the three-dimensional region of the environment comprises generating a three-dimensional location for each of the plurality of keypoints with reference to a point cloud coordinate frame. 4. The method of claim 1 , wherein: the environment is an environment in a vicinity of a vehicle; and the respective portions of the three-dimensional region in the environment assigned to the plurality of keypoints collectively define a respective estimated pose of one or more pedestrians in the environment. 5. The method of claim 1 , wherein generating the fused representation from the respective feature vectors comprises, for each of a plurality of data points in the point cloud: computing a concatenation of the respective feature vector and a vector specifying the three-dimensional location of the data point. 6. The method of claim 1 , further comprising training the pose estimation neural network, the training comprising: obtaining first training data comprising a labeled image associated with a label that specifies a plurality of ground truth keypoint locations in the image; obtaining second training data comprising an unlabeled point cloud; generating, for the unlabeled point cloud and based on the plurality of ground truth keypoint locations in the labeled image, a first pseudo label that specifies a plurality of pseudo keypoint locations in the unlabeled point cloud; and training the pose estimation neural network by using the first pseudo label as target pose estimation network output. 7. The method of claim 6 , wherein the training comprises, for the unlabeled point cloud: processing, using the pose estimation neural network, the fused representation to generate a training pose estimation network output that assigns a plurality of predicted keypoints to the unlabeled point cloud; computing a first loss based on a difference between the target pose estimation network output and the training pose estimation network output; determining, based on computing a gradient of the first loss with respect to pose estimation network parameters of the pose estimation neural network, an update to current values of the pose estimation network parameters. 8. The method of claim 7 , wherein the first loss is a Huber loss or a mean squared error loss. 9. The method of claim 6 , wherein the pose estimation neural network comprises: a first sub network configured to receive the fused representation and to process the fused representation to generate an intermediate representation of the fused representation; and a second sub network configured to receive the intermediate representation and to process the intermediate representation to generate the pose estimation network output. 10. The method of claim 9 , further comprising: generating, for the point cloud and based on the plurality of ground truth keypoint locations in the image, a second pseudo label that specifies a pseudo classification for each of the plurality of data points in the point cloud, wherein the pseudo classification specifies whether a corresponding data point should be classified as a keypoint or not; and training the first sub network and a data point classification neural network by using the pseudo classifications as target data point classification network outputs, the data point classification neural network having a plurality of data point classification network parameters and configured receive the intermediate representation generated by the first sub network of the pose estimation neural network and to process the intermediate representation to generate a data point classification network output that specifies a predicted classification for each of the plurality of data points in the point cloud. 11. The method of claim 9 , wherein the training comprises: processing, using the data point classification neural network and in accordance with current values of the data point classification network parameters, the intermediate representation to generate the training data point classification network output; computing a second loss based on a difference between the target data point classification network output and the training data point classification network output; determining, based on computing a gradient of the second loss with respect to the data point classification network parameters and backpropagating the gradient of the second loss through the data point classification network parameters into network parameters of the first sub network, an update to the current values of the first sub network parameters. 12. The method of claim 11 , further comprising determining an update to the current values of the data point classification network parameters. 13. The method of claim 11 , wherein the second loss is a sigmoid loss. 14. The method of claim 6 , wherein generating the first pseudo label that specifies a plurality of pseudo keypoint locations in the point cloud comprises: projecting the plurality of data points in the point cloud onto the image coordinate frame; identifying a subset of the plurality of projected data points that are within a predetermined distance of each of one or more of the plurality of ground truth keypoint locations in the image. 15. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud, comprising: processing the image to generate, for each of a plurality of keypoints, a respective score of each of a plurality of locations in the image, wherein the respective score represents a likelihood that the keypoint is located at the location in the image; determining, for each of a plurality of data points i
Matching configurations of points or features · CPC title
Static body considered as a whole, e.g. static pedestrian or occupant recognition · CPC title
of extracted features · CPC title
Incorporation of unlabelled data, e.g. multiple instance learning [MIL] · CPC title
using feature-based methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.