Joint 3d detection and segmentation using bird's eye view and perspective view

US2025054286A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025054286-A1
Application numberUS-202318475988-A
CountryUS
Kind codeA1
Filing dateSep 27, 2023
Priority dateAug 7, 2023
Publication dateFeb 13, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An image processing method includes performing, using images obtained from one or more sensors onboard a vehicle, a 2-dimensional (2D) feature extraction; performing, a 3-dimensional (3D) feature extraction on the images; detecting objects in the images by fusing detection results from the 2D feature extraction and the 3D feature extraction.

First claim

Opening claim text (preview).

1 . A method of detecting objects in sensor data, comprising: performing, using images obtained from one or more sensors onboard a vehicle, a 2-dimensional (2D) feature extraction; performing, a 3-dimensional (3D) feature extraction on the images; detecting objects in the images by fusing detection results from the 2D feature extraction and the 3D feature extraction. 2 . The method of claim 1 , wherein the 2D feature extraction comprises a perspective view (PV) analysis of the images. 3 . The method of claim 1 , wherein the 3D feature extraction comprises a bird's eye view (BEV) analysis of the images. 4 . The method of any claim 1 , wherein the 3D feature extraction is performed by: generating 3D features from 2D features resulting from the 2D feature extraction; and the method further includes refining 3D feature estimates using dual-space object queries that include joint proposals formed based on 2D features resulting from the 2D feature extraction and 3D features resulting from the 3D feature extraction. 5 . The method of claim 4 , wherein the generating the 3D features from the 2D features comprises applying a back-projection model to the 2D features. 6 . The method of claim 4 , wherein the refining comprises performing a multi-level refinement wherein, at each layer of the multi-level refinement, a self-attention layer that acts on both the 2D features and the 3D features a first cross-attention layer that acts only on the 2D features and a second cross-attention layer that acts only on the 3D features are used. 7 . The method of claim 4 , wherein a shared pose is further used during the refining. 8 . The method of claim 1 , wherein the 2D feature extraction method comprises a 3D objection method. 9 . An apparatus for detecting objects in sensor data, the apparatus comprising at least one processor configured to: perform, from the sensor data, a 2-dimensional (2D) feature extraction; perform, from the sensor data, a 3-dimensional (3D) feature extraction; detect objects in the images by fusing detection results from the 2D feature extraction and the 3D feature extraction. 10 . The apparatus of claim 9 , wherein the 2D feature extraction comprises a perspective view (PV) analysis of the images and the 3D feature extraction comprises a bird's eye view (BEV) analysis of the images. 11 . The method of claim 9 , wherein the at least one processor performs the 3D feature extraction by: generating 3D features from 2D features resulting from the 2D feature extraction; and the method further includes refining 3D feature estimates using dual-space object queries that include joint proposals formed based on 2D features resulting from the 2D feature extraction and 3D features resulting from the 3D feature extraction. 12 . The apparatus of claim 11 , wherein the generating the 3D features from the 2D features comprises applying a back-projection model to the 2D features. 13 . The apparatus of claim 11 , wherein the refining is performed using a Conv3D or a Conv2D algorithm. 14 . The apparatus of claim 11 , wherein the refining comprises performing a multi-level refinement wherein, at each layer of the multi-level refinement, a self-attention layer that acts on both the 2D features and the 3D features a first cross-attention layer that acts only on the 2D features and a second cross-attention layer that acts only on the 3D features are used. 15 . The apparatus of claim 9 , wherein the 2D feature extraction method comprises a 3D objection method. 16 . The apparatus of claim 9 , wherein the 3D feature extraction method comprises a dense segmentation and/or a detection method. 17 . A system for deployment on an autonomous vehicle, comprising: one or more sensors configured to generate sensor data of an environment of the autonomous vehicle; and at least one processor configured to detect objects in the sensor data by: performing, using the sensor data obtained from one or more sensors, a 2-dimensional (2D) feature extraction; performing, a 3-dimensional (3D) feature extraction on the sensor data; detecting objects in the images by fusing detection results from the 2D feature extraction and the 3D feature extraction. 18 . The system of claim 17 , wherein the one or more processor performs the 3D feature extraction by: generating 3D features from 2D features resulting from the 2D feature extraction; and refining 3D feature estimates using dual-space object queries that include joint proposals formed based on 2D features resulting from the 2D feature extraction and 3D features resulting from the 3D feature extraction. 19 . The system of claim 17 , wherein hybrid detection proposals are used for querying for the objects. 20 . The system of claim 17 , wherein the one or more sensors include a camera and a lidar.

Assignees

Inventors

Classifications

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • of extracted features · CPC title

  • using neural networks · CPC title

  • G06V10/806Primary

    of extracted features · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025054286A1 cover?
An image processing method includes performing, using images obtained from one or more sensors onboard a vehicle, a 2-dimensional (2D) feature extraction; performing, a 3-dimensional (3D) feature extraction on the images; detecting objects in the images by fusing detection results from the 2D feature extraction and the 3D feature extraction.
Who is the assignee on this patent?
Tusimple Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/806. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).