Streaming object detection and segmentation with polar pillars

US12583464B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12583464-B2
Application numberUS-202217710895-A
CountryUS
Kind codeB2
Filing dateMar 31, 2022
Priority dateMay 21, 2021
Publication dateMar 24, 2026
Grant dateMar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Among other things, techniques for detecting objects in the environment surrounding a vehicle are described. A computer system is configured to receive a set of measurements from a sensor of a vehicle. The set of measurements includes a plurality of data points that represent a plurality of objects in a 3D space surrounding the vehicle. The system divides the 3D space into a plurality of pillars. The system then assigns each data point of the plurality of data points to a pillar in the plurality of pillars. The system generates a pseudo-image based on the plurality of pillars. The pseudo-image includes, for each pillar of the plurality of pillars, a corresponding feature representation of data points assigned to the pillar. The system detects the plurality of objects based on an analysis of the pseudo-image. The system then operates the vehicle based upon the detecting of the objects.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: one or more computer processors; and one or more non-transitory storage media storing instructions which, when executed by the one or more computer processors, cause performance of operations comprising: dividing streamed sectors of data points representing a three-dimensional (3D) space surrounding a vehicle into a plurality of polar pillars, wherein each polar pillar of the plurality of polar pillars comprises a slice of the 3D space that extends from a two-dimensional (2D) polar grid on a ground plane comprising wedged-shaped regions corresponding to the streamed sectors in the 3D space, wherein each data point of the sectors is assigned to a polar pillar in the plurality of polar pillars; encoding the streamed sectors into a wedge-shaped region in a bird's eye view using polar pillars to obtain pillar-wise features on the polar grid, wherein 3D stacked polar pillar tensors are generated for non-empty polar pillars and convolution is iteratively applied to the 3D stacked polar pillar tensors to generate the pillar wise features; generating a feature map based on the pillar-wise features, wherein each polar pillar of the plurality of polar pillars corresponds to a polar feature representation of data points assigned to the polar pillar; inputting the feature map into a segmentation head, an object detection head, and a bounding box head of a network simultaneously; outputting per-pixel segments, object classes, and bounding boxes from the segmentation head, object detection head, and bounding box head respectively, wherein the object detection head transforms the feature representation to a Cartesian representation for object detection and the bounding box head applies kernels to the data points of the feature representation based on a range for bounding box generation; and operating the vehicle in the 3D space according to the per-pixel segments, object classes, and bounding boxes, wherein the streamed sectors are iteratively processed. 2 . The system of claim 1 , wherein transforming the feature representation to the Cartesian representation for object detection comprises transforming the feature representation of the data points assigned to a respective polar pillar from a polar representation in a 2D polar grid to a canonical Cartesian representation. 3 . The system of claim 1 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels and normalization to data points assigned to respective polar pillars at different ranges. 4 . The system of claim 1 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels to data points of the feature representation based on a range for at shared convolution layers of the segmentation head, the object detection head, or the bounding box head. 5 . The system of claim 1 , wherein the operations comprise performing panoptic fusion to identify different instances of a same object class and operating the vehicle in the 3D space according to panoptic segmentation. 6 . The system of claim 1 , wherein a backbone upsamples the feature map prior to inputting the feature map into the segmentation head of the network. 7 . The system of claim 1 , wherein the feature map is padded via multi-scale context padding prior to inputting the feature map into the segmentation head, object detection head, and bounding box head of the network. 8 . The system of claim 1 , wherein the 2D polar grid has substantially wedge shaped cells with varying cell sizes dependent upon a density of objects in a corresponding region of the 3D space surrounding the vehicle. 9 . The system of claim 1 , wherein the feature map is undistorted by interpolating features at Cartesian pillar locations using original pillar locations of the pillar-wise features. 10 . A method, comprising: dividing, with at least one processor, streamed sectors of data points representing a three-dimensional (3D) space surrounding a vehicle into a plurality of polar pillars, wherein each polar pillar of the plurality of polar pillars comprises a slice of the 3D space that extends from a two-dimensional (2D) polar grid on a ground plane comprising wedged-shaped regions corresponding to the streamed sectors in the 3D space, wherein each data point of the sectors is assigned to a polar pillar in the plurality of polar pillars; encoding, with the at least one processor, the streamed sectors into a wedge-shaped region in a bird's eye view using polar pillars to obtain pillar-wise features on the polar grid, wherein 3D stacked polar pillar tensors are generated for non-empty polar pillars and convolution is iteratively applied to the 3D stacked polar pillar tensors to generate the pillar wise features; generating, with the at least one processor, a feature map based on the pillar-wise features, wherein each polar pillar of the plurality of polar pillars corresponds to a polar feature representation of data points assigned to the polar pillar; inputting, with the at least one processor, the feature map into a segmentation head, an object detection head, and a bounding box head of a network simultaneously; outputting, with the at least one processor, per-pixel segments, object classes, and bounding boxes from the segmentation head, object detection head, and bounding box head respectively, wherein the object detection head transforms the feature representation to a Cartesian representation for object detection and the bounding box head applies kernels to the data points of the feature representation based on a range for bounding box generation; and operating, with the at least one processor, the vehicle in the 3D space according to the per-pixel segments, object classes, and bounding boxes, wherein the streamed sectors are iteratively processed. 11 . The method of claim 10 , wherein transforming the feature representation to the Cartesian representation for object detection comprises transforming the feature representation of the data points assigned to a respective polar pillar from a polar representation in a 2D polar grid to a canonical Cartesian representation. 12 . The method of claim 10 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels and normalization to data points assigned to respective polar pillars at different ranges. 13 . The method of claim 10 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels to data points of the feature representation based on a range for at shared convolution layers of the segmentation head, the object detection head, or the bounding box head. 14 . The method of claim 10 , comprising performing panoptic fusion to identify different instances of a same object class and operating the vehicle in the 3D space according to panoptic segmentation. 15 . The method of claim 10 , wherein a backbone upsamples the feature map prior to inputting the feature map into the segmentation head of the network. 16 . The method of claim 10 , wherein the feature map is padded via multi-scale context padding prior to inputting the feature map into the segmentation head, object detection head, and bounding box head of the network. 17 . At least one non-transitory storage media storing instructions that, when executed by at least one processor, cause the at least one proc

Assignees

Inventors

Classifications

  • characterised by the type of data · CPC title

  • specially adapted for safety · CPC title

  • Details of control systems ensuring comfort, safety or stability not otherwise provided for · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12583464B2 cover?
Among other things, techniques for detecting objects in the environment surrounding a vehicle are described. A computer system is configured to receive a set of measurements from a sensor of a vehicle. The set of measurements includes a plurality of data points that represent a plurality of objects in a 3D space surrounding the vehicle. The system divides the 3D space into a plurality of pillar…
Who is the assignee on this patent?
Motional Ad Llc
What technology area does this patent fall under?
Primary CPC classification G01C21/3807. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).