Who is the assignee on this patent?

Toyota Motor Europe, Toyota Motor Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06T7/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

6D pose and shape estimation method

US12340535B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12340535-B2
Application number	US-202017801077-A
Country	US
Kind code	B2
Filing date	Feb 21, 2020
Priority date	Feb 21, 2020
Publication date	Jun 24, 2025
Grant date	Jun 24, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of estimating a 6D pose and shape of one or more objects from a 2D image, comprises the steps of: detecting, within the 2D image, one or more 2D regions of interest, each 2D region of interest containing a corresponding object among the one of more objects; cropping out a corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; concatenating the corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; and inferring, for each 2D region of interest, a 4D quaternion describing a rotation of the corresponding object in the 3D rotation group, a 2D centroid, which is a projection of a 3D translation of the corresponding object onto a plane of the 2D image given a camera matrix associated to the 2D image, a distance from a viewpoint of the 2D image to the corresponding object, a size, and a class-specific latent shape vector of the corresponding object.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of estimating 3D position, orientation and shape of one or more objects, the method comprising: capturing, with an imaging device, a 2D image of the one or more objects; detecting, within the 2D image, one or more 2D regions of interest, each 2D region of interest containing a corresponding object among the one of more objects; cropping out a corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; concatenating the corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; inferring, for each 2D region of interest, a 4D quaternion describing a rotation of the corresponding object in the 3D rotation group, a 2D centroid, which is a projection of a 3D translation of the corresponding object onto a plane of the 2D image given a camera matrix associated to the 2D image, a distance from a viewpoint of the 2D image to the corresponding object, a size, and a class-specific latent shape vector of the corresponding object which represents an offset from a mean latent shape representation of a corresponding object class; and adding the class-specific latent shape vector to the mean latent shape representation of the corresponding object class to obtain an absolute shape vector of the corresponding object. 2. The computer-implemented method according to claim 1 , wherein the cropping out a corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest also comprises resizing them into a uniform array size. 3. The computer-implemented method according to claim 2 , further comprising back projecting the 2D centroid using the distance from the viewpoint and the camera matrix to compute the 3D translation. 4. The computer-implemented method according to claim 3 , wherein the 4D quaternion describes the rotation in an allocentric projection space and the method further comprises computing an egocentric projection using the 4D quaternion and the 3D translation. 5. The computer-implemented method according to claim 1 , comprising reconstructing an unscaled 3D point cloud, from the absolute shape vector, using a separately trained decoder neural network. 6. The computer-implemented method according to claim 1 , further comprising scaling the unscaled 3D point cloud, using the inferred size, to obtain a scaled 3D point cloud of the corresponding object. 7. The computer-implemented method according to claim 6 , wherein method further comprises meshing the scaled 3D point cloud to generate a triangle mesh of the scaled 3D shape. 8. The computer-implemented method according to claim 7 , wherein the method further comprises merging mesh triangles of the triangle mesh, using a ball pivoting algorithm, to fill any remaining hole in the triangle mesh. 9. The computer-implemented method according to claim 8 , wherein the method further comprises applying a Laplacian filter to the triangle mesh to generate a smoothed scaled 3D shape (M) of the corresponding object. 10. The computer-implemented method according to claim 1 , wherein the one or more 2D regions of interest are detected within the 2D image using a feature pyramid network. 11. The computer-implemented method according to claim 10 , further comprising a step of classifying each 2D region of interest using a fully convolutional neural network attached to each level of the feature pyramid network. 12. The computer-implemented method according to claim 11 , further comprising a step of regressing a boundary of each 2D region of interest towards the corresponding object using another fully convolutional neural network attached to each level of the feature pyramid network. 13. The computer-implemented method according to claim 1 , wherein the step of inferring, for each 2D region of interest, the 4D quaternion, 2D centroid, distance, size, and class-specific latent shape vector of the corresponding object is carried out using a separate neural network for each one of the 4D quaternion, 2D centroid, distance, size, and class-specific latent shape vector. 14. The computer-implemented method according to claim 13 , wherein each separate neural network for inferring the 4D quaternion, 2D centroid, distance, size, and class-specific latent shape vector comprises multiple 2D convolution layers, each followed by a batch normalization layer and a rectified linear unit activation layer, and a fully-connected layer at the end of the separate neural network. 15. The computer-implemented method according to claim 14 , wherein each one of the separate neural networks for inferring the 4D quaternion and distance comprises four 2D convolution layers followed each by a batch normalization layer and a rectified linear unit activation layer, whereas each one of the separate neural networks for inferring the 2D centroid, size, and class-specific latent shape vector comprises only two 2D convolution layers followed each by a batch normalization layer and a rectified linear unit activation layer. 16. The computer-implemented method according to claim 1 , wherein the 2D image is in the form of a pixel array with at least one value for each pixel. 17. The computer-implemented method according to claim 16 , wherein the pixel array has an intensity value for each of three colors for each pixel. 18. A system comprising a data processing device programmed to estimate 3D position, orientation and shape of one or more objects from a 2D image, and an imaging device connected to input the 2D image to the data processing device, wherein the data processing device is further programmed to: detect, within the 2D image, one or more 2D regions of interest, each 2D region of interest containing a corresponding object among the one of more objects; crop out a corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; concatenate the corresponding pixel value array, coordinate tensor, and feature map for each 2D region of interest; infer, for each 2D region of interest, a 4D quaternion describing a rotation of the corresponding object in the 3D rotation group, a 2D centroid, which is a projection of a 3D translation of the corresponding object onto a plane of the 2D image given a camera matrix associated to the 2D image, a distance from a viewpoint of the 2D image to the corresponding object, a size, and a class-specific latent shape vector of the corresponding object which represents an offset from a mean latent shape representation of a corresponding object class; and add the class-specific latent shape vector to the mean latent shape representation of the corresponding object class to obtain an absolute shape vector of the corresponding object. 19. The system of claim 18 , further comprising a robotic manipulator connected to the data processing device, wherein the data processing device is also programmed to control the manipulator based on the estimated 3D position, orientation and shape of each object in the 2D image. 20. The system of claim 18 , further comprising propulsion, steering and/or braking devices, wherein the data processing device is also programmed to control and/or assist control of the propulsion, steering and/or braking devices based on the estimated 3D position, orientation and shape of each object in the 2D image.

Assignees

Inventors

Classifications

B60W2420/403
Image sensing, e.g. optical camera · CPC title
G08G1/166
for active traffic, e.g. moving vehicles, pedestrians, bikes · CPC title
G08G1/165
for passive traffic, e.g. including static obstacles, trees · CPC title
G06T2207/30252
Vehicle exterior; Vicinity of vehicle · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title

Patent family

Related publications grouped by family.

View patent family 69701181

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12340535B2 cover?: A computer-implemented method of estimating a 6D pose and shape of one or more objects from a 2D image, comprises the steps of: detecting, within the 2D image, one or more 2D regions of interest, each 2D region of interest containing a corresponding object among the one of more objects; cropping out a corresponding pixel value array, coordinate tensor, and feature map for each 2D region of inte…
Who is the assignee on this patent?: Toyota Motor Europe, Toyota Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 24 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).