Multi-modal 3-d pose estimation

US2025037303A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025037303-A1
Application numberUS-202418614254-A
CountryUS
Kind codeA1
Filing dateMar 22, 2024
Priority dateNov 16, 2020
Publication dateJan 30, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating a 3-D pose of an object of interest from image and point cloud data. In one aspect, a method includes obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud; and processing the fused representation using a pose estimation neural network and in accordance with current values of a plurality of pose estimation network parameters to generate a pose estimation network output that specifies, for each of multiple keypoints, a respective estimated position in the three-dimensional region of the environment.

First claim

Opening claim text (preview).

1 - 20 . (canceled) 21 . A method performed by one or more computers, wherein the method comprises: generating a fused representation of an image of an environment and a point cloud of a three-dimensional region of the environment, wherein the point cloud comprises a plurality of data points, and wherein the generating comprises: generating, based on the image and for each of a plurality of keypoints, a score for each of a plurality of locations in the image; and generating, for each of the plurality of data points in the point cloud, a respective feature vector that includes at least some of the scores generated based on the image; processing the fused representation to generate, for each of the plurality of keypoints, an estimated position in the three-dimensional region of the environment; and controlling an agent based at least on the estimated position for each of the plurality of keypoints. 22 . The method of claim 21 , wherein controlling the agent based at least on the estimated position for each of the plurality of keypoints comprises: generating, based at least on the estimated position for each of the plurality of keypoints, a planning decision for the agent; and controlling the agent to implement the planning decision by transmitting one or more electronic signals that have been generated in accordance with the planning decision to one or more control units of the agent. 23 . The method of claim 21 , wherein the agent comprises a vehicle, and wherein the environment is an environment in a vicinity of the vehicle. 24 . The method of claim 21 , wherein the plurality of keypoints collectively define an estimated pose of each of one or more pedestrians in the environment. 25 . The method of claim 23 , wherein the vehicle comprises an autonomous vehicle, or a semi-autonomous vehicle. 26 . The method of claim 22 , wherein the planning decision plans a future trajectory of the vehicle. 27 . The method of claim 21 , wherein the agent comprises a robot. 28 . The method of claim 21 , wherein processing the fused representation to generate the estimated position in the three-dimensional region of the environment comprises: processing the fused representation using a pose estimation neural network that has been trained on training data comprising both (i) labeled point cloud data that associates each of multiple point clouds with corresponding human assigned keypoints and (ii) unlabeled point cloud data for which human assigned keypoints are unavailable. 29 . A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising: generating a fused representation of an image of an environment and a point cloud of a three-dimensional region of the environment, wherein the point cloud comprises a plurality of data points, and wherein the generating comprises: generating, based on the image and for each of a plurality of keypoints, a score for each of a plurality of locations in the image; and generating, for each of the plurality of data points in the point cloud, a respective feature vector that includes at least some of the scores generated based on the image; processing the fused representation to generate, for each of the plurality of keypoints, an estimated position in the three-dimensional region of the environment; and controlling an agent based at least on the estimated position for each of the plurality of keypoints. 30 . The system of claim 29 , wherein controlling the agent based at least on the estimated position for each of the plurality of keypoints comprises: generating, based at least on the estimated position for each of the plurality of keypoints, a planning decision for the agent; and controlling the agent to implement the planning decision by transmitting one or more electronic signals that have been generated in accordance with the planning decision to one or more control units of the agent. 31 . The system of claim 29 , wherein the agent comprises a vehicle, and wherein the environment is an environment in a vicinity of the vehicle. 32 . The system of claim 29 , wherein the plurality of keypoints collectively define an estimated pose of each of one or more pedestrians in the environment. 33 . The system of claim 31 , wherein the vehicle comprises an autonomous vehicle, or a semi-autonomous vehicle. 34 . The system of claim 30 , wherein the planning decision plans a future trajectory of the vehicle. 35 . The system of claim 29 , wherein the agent comprises a robot. 36 . The system of claim 29 , wherein processing the fused representation to generate the estimated position in the three-dimensional region of the environment comprises: processing the fused representation using a pose estimation neural network that has been trained on training data comprising both (i) labeled point cloud data that associates each of multiple point clouds with corresponding human assigned keypoints and (ii) unlabeled point cloud data for which human assigned keypoints are unavailable. 37 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: generating a fused representation of an image of an environment and a point cloud of a three-dimensional region of the environment, wherein the point cloud comprises a plurality of data points, and wherein the generating comprises: generating, based on the image and for each of a plurality of keypoints, a score for each of a plurality of locations in the image; and generating, for each of the plurality of data points in the point cloud, a respective feature vector that includes at least some of the scores generated based on the image; processing the fused representation to generate, for each of the plurality of keypoints, an estimated position in the three-dimensional region of the environment; and controlling an agent based at least on the estimated position for each of the plurality of keypoints. 38 . The non-transitory computer-readable storage media of claim 37 , wherein controlling the agent based at least on the estimated position for each of the plurality of keypoints comprises: generating, based at least on the estimated position for each of the plurality of keypoints, a planning decision for the agent; and controlling the agent to implement the planning decision by transmitting one or more electronic signals that have been generated in accordance with the planning decision to one or more control units of the agent. 39 . The non-transitory computer-readable storage media of claim 37 , wherein the agent comprises a vehicle, and wherein the environment is an environment in a vicinity of the vehicle. 40 . The non-transitory computer-readable storage media of claim 37 , wherein the plurality of keypoints collectively define an estimated pose of each of one or more pedestrians in the environment.

Assignees

Inventors

Classifications

  • of input or preprocessed data · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

  • Range image; Depth image; 3D point clouds · CPC title

  • Artificial neural networks [ANN] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025037303A1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for estimating a 3-D pose of an object of interest from image and point cloud data. In one aspect, a method includes obtaining an image of an environment; obtaining a point cloud of a three-dimensional region of the environment; generating a fused representation of the image and the point cloud; a…
Who is the assignee on this patent?
Waymo Llc
What technology area does this patent fall under?
Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 30 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).