Streaming object detection and segmentation with polar pillars
US-11798289-B2 · Oct 24, 2023 · US
US12583464B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12583464-B2 |
| Application number | US-202217710895-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 31, 2022 |
| Priority date | May 21, 2021 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Among other things, techniques for detecting objects in the environment surrounding a vehicle are described. A computer system is configured to receive a set of measurements from a sensor of a vehicle. The set of measurements includes a plurality of data points that represent a plurality of objects in a 3D space surrounding the vehicle. The system divides the 3D space into a plurality of pillars. The system then assigns each data point of the plurality of data points to a pillar in the plurality of pillars. The system generates a pseudo-image based on the plurality of pillars. The pseudo-image includes, for each pillar of the plurality of pillars, a corresponding feature representation of data points assigned to the pillar. The system detects the plurality of objects based on an analysis of the pseudo-image. The system then operates the vehicle based upon the detecting of the objects.
Opening claim text (preview).
What is claimed is: 1 . A system comprising: one or more computer processors; and one or more non-transitory storage media storing instructions which, when executed by the one or more computer processors, cause performance of operations comprising: dividing streamed sectors of data points representing a three-dimensional (3D) space surrounding a vehicle into a plurality of polar pillars, wherein each polar pillar of the plurality of polar pillars comprises a slice of the 3D space that extends from a two-dimensional (2D) polar grid on a ground plane comprising wedged-shaped regions corresponding to the streamed sectors in the 3D space, wherein each data point of the sectors is assigned to a polar pillar in the plurality of polar pillars; encoding the streamed sectors into a wedge-shaped region in a bird's eye view using polar pillars to obtain pillar-wise features on the polar grid, wherein 3D stacked polar pillar tensors are generated for non-empty polar pillars and convolution is iteratively applied to the 3D stacked polar pillar tensors to generate the pillar wise features; generating a feature map based on the pillar-wise features, wherein each polar pillar of the plurality of polar pillars corresponds to a polar feature representation of data points assigned to the polar pillar; inputting the feature map into a segmentation head, an object detection head, and a bounding box head of a network simultaneously; outputting per-pixel segments, object classes, and bounding boxes from the segmentation head, object detection head, and bounding box head respectively, wherein the object detection head transforms the feature representation to a Cartesian representation for object detection and the bounding box head applies kernels to the data points of the feature representation based on a range for bounding box generation; and operating the vehicle in the 3D space according to the per-pixel segments, object classes, and bounding boxes, wherein the streamed sectors are iteratively processed. 2 . The system of claim 1 , wherein transforming the feature representation to the Cartesian representation for object detection comprises transforming the feature representation of the data points assigned to a respective polar pillar from a polar representation in a 2D polar grid to a canonical Cartesian representation. 3 . The system of claim 1 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels and normalization to data points assigned to respective polar pillars at different ranges. 4 . The system of claim 1 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels to data points of the feature representation based on a range for at shared convolution layers of the segmentation head, the object detection head, or the bounding box head. 5 . The system of claim 1 , wherein the operations comprise performing panoptic fusion to identify different instances of a same object class and operating the vehicle in the 3D space according to panoptic segmentation. 6 . The system of claim 1 , wherein a backbone upsamples the feature map prior to inputting the feature map into the segmentation head of the network. 7 . The system of claim 1 , wherein the feature map is padded via multi-scale context padding prior to inputting the feature map into the segmentation head, object detection head, and bounding box head of the network. 8 . The system of claim 1 , wherein the 2D polar grid has substantially wedge shaped cells with varying cell sizes dependent upon a density of objects in a corresponding region of the 3D space surrounding the vehicle. 9 . The system of claim 1 , wherein the feature map is undistorted by interpolating features at Cartesian pillar locations using original pillar locations of the pillar-wise features. 10 . A method, comprising: dividing, with at least one processor, streamed sectors of data points representing a three-dimensional (3D) space surrounding a vehicle into a plurality of polar pillars, wherein each polar pillar of the plurality of polar pillars comprises a slice of the 3D space that extends from a two-dimensional (2D) polar grid on a ground plane comprising wedged-shaped regions corresponding to the streamed sectors in the 3D space, wherein each data point of the sectors is assigned to a polar pillar in the plurality of polar pillars; encoding, with the at least one processor, the streamed sectors into a wedge-shaped region in a bird's eye view using polar pillars to obtain pillar-wise features on the polar grid, wherein 3D stacked polar pillar tensors are generated for non-empty polar pillars and convolution is iteratively applied to the 3D stacked polar pillar tensors to generate the pillar wise features; generating, with the at least one processor, a feature map based on the pillar-wise features, wherein each polar pillar of the plurality of polar pillars corresponds to a polar feature representation of data points assigned to the polar pillar; inputting, with the at least one processor, the feature map into a segmentation head, an object detection head, and a bounding box head of a network simultaneously; outputting, with the at least one processor, per-pixel segments, object classes, and bounding boxes from the segmentation head, object detection head, and bounding box head respectively, wherein the object detection head transforms the feature representation to a Cartesian representation for object detection and the bounding box head applies kernels to the data points of the feature representation based on a range for bounding box generation; and operating, with the at least one processor, the vehicle in the 3D space according to the per-pixel segments, object classes, and bounding boxes, wherein the streamed sectors are iteratively processed. 11 . The method of claim 10 , wherein transforming the feature representation to the Cartesian representation for object detection comprises transforming the feature representation of the data points assigned to a respective polar pillar from a polar representation in a 2D polar grid to a canonical Cartesian representation. 12 . The method of claim 10 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels and normalization to data points assigned to respective polar pillars at different ranges. 13 . The method of claim 10 , wherein applying kernels to data points of the feature representation based on a range for bounding box generation comprises applying kernels to data points of the feature representation based on a range for at shared convolution layers of the segmentation head, the object detection head, or the bounding box head. 14 . The method of claim 10 , comprising performing panoptic fusion to identify different instances of a same object class and operating the vehicle in the 3D space according to panoptic segmentation. 15 . The method of claim 10 , wherein a backbone upsamples the feature map prior to inputting the feature map into the segmentation head of the network. 16 . The method of claim 10 , wherein the feature map is padded via multi-scale context padding prior to inputting the feature map into the segmentation head, object detection head, and bounding box head of the network. 17 . At least one non-transitory storage media storing instructions that, when executed by at least one processor, cause the at least one proc
characterised by the type of data · CPC title
specially adapted for safety · CPC title
Details of control systems ensuring comfort, safety or stability not otherwise provided for · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.