Deep neural network for segmentation of road scenes and animate object instances for autonomous driving applications

US12437412B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437412-B2
Application numberUS-202318397921-A
CountryUS
Kind codeB2
Filing dateDec 27, 2023
Priority dateJul 25, 2019
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A deep neural network(s) (DNN) may be used to perform panoptic segmentation by performing pixel-level class and instance segmentation of a scene using a single pass of the DNN. Generally, one or more images and/or other sensor data may be stitched together, stacked, and/or combined, and fed into a DNN that includes a common trunk and several heads that predict different outputs. The DNN may include a class confidence head that predicts a confidence map representing pixels that belong to particular classes, an instance regression head that predicts object instance data for detected objects, an instance clustering head that predicts a confidence map of pixels that belong to particular instances, and/or a depth head that predicts range values. These outputs may be decoded to identify bounding shapes, class labels, instance labels, and/or range values for detected objects, and used to enable safe path planning and control of an autonomous vehicle.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more processors comprising processing circuitry to: generate, using a neural network and based at least on a representation of image data corresponding to an environment of an ego-object, one or more classifications of one or more pixels, the one or more classifications indicating associations of the one or more pixels with one or more unique instances corresponding to one or more respective channels of the neural network; generate, based at least on the one or more classifications, one or more bounding shapes of the one or more unique instances of one or more detected objects in the environment; and execute one or more operations of the ego-object based at least on the one or more bounding shapes. 2. The one or more processors of claim 1 , wherein the neural network comprises an instance clustering head comprising a respective classification channel for each of a plurality of detectable unique instances. 3. The one or more processors of claim 1 , wherein the one or more classifications comprise, for each channel of at least one of the one or more respective channels, a respective confidence map representing pixels that belong to a respective instance of the one or more unique instances. 4. The one or more processors of claim 1 , wherein the processing circuitry is further to generate the one or more bounding shapes using a connected components analysis to detect one or more boundaries of one or more clusters of connected or occluded unique instances represented by the one or more classifications. 5. The one or more processors of claim 1 , wherein the processing circuitry is further to identify a plurality of globally unique instances from each cluster of one or more clusters of connected or occluded unique instances detected from each classification channel of at least one of the one or more respective channels. 6. The one or more processors of claim 1 , wherein the processing circuitry is further to identify a plurality of globally unique instances based at least on: using a connected components analysis to assign a plurality of disconnected regions to a first instance, and determining that the plurality of disconnected regions correspond to the plurality of globally unique instances based at least on a minimum separation between the disconnected regions. 7. The one or more processors of claim 1 , wherein the one or more classifications comprise a depth-wise probability distribution per pixel representing a predicted likelihood, for each channel of a plurality of channels of the neural network, that each pixel of at least one of the one or more pixels belongs to a respective unique instance corresponding to the channel. 8. The one or more processors of claim 1 , wherein the processing circuitry is further to identify at least one unique instance of the one or more unique instances based at least on joining distinct connected regions of the one or more classifications, that are separated by less than a threshold gap, into a composite region representing the at least one unique instance. 9. The one or more processors of claim 1 , wherein using the neural network performs panoptic segmentation comprising class segmentation and instance regression in a single pass of the neural network. 10. The one or more processors of claim 1 , wherein the one or more processors are comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using a robot; a system for generating synthetic data; a system for generating synthetic data using AI; or a system implemented at least partially using cloud computing resources. 11. A system comprising one or more processors to: generate, using a neural network and based at least on a representation of sensor data corresponding to an environment of an ego-object, one or more classifications associating one or more pixels with one or more unique instances corresponding to one or more respective channels of the neural network; and execute one or more operations of the ego-object based at least on the one or more classifications. 12. The system of claim 11 , wherein the neural network comprises an instance clustering head comprising a respective classification channel for each of a plurality of detectable unique instances. 13. The system of claim 11 , wherein the one or more classifications comprise, for each channel of at least one of the one or more respective channels, a respective confidence map representing pixels that belong to a respective instance of the one or more unique instances. 14. The system of claim 11 , wherein the one or more processors are further to generate one or more bounding shapes of the one or more unique instances using a connected components analysis to detect one or more boundaries of one or more clusters of connected or occluded unique instances represented by the one or more classifications. 15. The system of claim 11 , wherein the one or more processors are further to identify a plurality of globally unique instances from each cluster of one or more clusters of connected or occluded unique instances detected from each classification channel of at least one of the one or more respective channels. 16. The system of claim 11 , wherein the one or more processors are further to identify a plurality of globally unique instances based at least on: using a connected components analysis to assign a plurality of disconnected regions to a first instance, and determining that the plurality of disconnected regions correspond to the plurality of globally unique instances based at least on a minimum separation between the disconnected regions. 17. The system of claim 11 , wherein the one or more classifications comprise a depth-wise probability distribution per pixel representing a predicted likelihood, for each channel of a plurality of channels of the neural network, that each pixel of at least one of the one or more pixels belongs to a respective unique instance corresponding to the channel. 18. The system of claim 11 , wherein the system is comprised in at least one of: a control system for an autonomous or semi-autonomous machine; a perception system for an autonomous or semi-autonomous machine; a system for performing simulation operations; a system for performing deep learning operations; a system for performing remote operations; a system for performing real-time streaming; a system for generating or presenting one or more of augmented reality content, virtual reality content, or mixed reality content; a system implemented using a robot; a system for generating synthetic data; a system for generating synthetic data using AI; or a system implemented at least partially using cloud computing resources. 19. A method comprising: generate, based at least on using a neural network to process a representation of sensor data corresponding to an environment of an ego-object, one or more classifications of one or more pixels into one or more unique instances corresponding to one or more respective channels of the neural network; and execute one or more operations of the ego-object based at least on the one or more c

Assignees

Inventors

Classifications

  • using machine learning, e.g. neural networks · CPC title

  • Handing over between on-board automatic and on-board manual control · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • using neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437412B2 cover?
A deep neural network(s) (DNN) may be used to perform panoptic segmentation by performing pixel-level class and instance segmentation of a scene using a single pass of the DNN. Generally, one or more images and/or other sensor data may be stitched together, stacked, and/or combined, and fed into a DNN that includes a common trunk and several heads that predict different outputs. The DNN may inc…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).