Deep neural network for segmentation of road scenes and animate object instances for autonomous driving applications
US-2021026355-A1 · Jan 28, 2021 · US
US12482106B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12482106-B2 |
| Application number | US-202217983119-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 8, 2022 |
| Priority date | Oct 21, 2021 |
| Publication date | Nov 25, 2025 |
| Grant date | Nov 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for segmenting objects in a scene by an electronic device is provided. The method includes inputting at least one input frame of the scene into a pre-trained neural network model, the scene including a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generating a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene.
Opening claim text (preview).
What is claimed is: 1 . A method for segmenting objects in a scene by an electronic device, the method comprising: inputting at least one input frame of the scene into a pre-trained neural network model, the scene comprising a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generating a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene, wherein the generating the segment mask for each object of the plurality of objects comprises: obtaining semantically aware center maps and shape aware prototype masks associated with each object of the plurality of objects in the scene, determining a linear combination of the semantically aware center maps and the shape aware prototype masks weighted by corresponding coefficients of the array of coefficients on each center location, and generating the segment mask for each object of the plurality of objects based on the linear combination of the semantically aware center maps and the shape aware prototype masks. 2 . The method of claim 1 , wherein the method further comprises displaying the segment mask for each object in the scene that segments overlapping objects of the plurality of objects in the scene. 3 . The method of claim 1 , wherein the determining the position of each object of the plurality of objects in the scene using the pre-trained neural network model comprises: generating a center map using the pre-trained neural network model, wherein the center map comprises N channels that corresponds to a number of semantic categories representing each object in the scene; and determining the position of each object of the plurality of objects in the scene based on the center map. 4 . The method of claim 3 , wherein the generating the center map comprises: inputting the at least one input frame of the scene to the pre-trained neural network model and obtaining an N channel feature map as an output from the pre-trained neural network model, wherein N corresponds to a number of semantic categories that are supported; and obtaining the center map by predicting, based on the N channel feature map, center positions of each object of the plurality of objects in the at least one input frame input to the pre-trained neural network model. 5 . The method of claim 4 , wherein the predicting the center positions of each object of the plurality of objects comprises: locating a local maxima by suppressing local minimum areas and capturing only local maximums for each channel of the N channel feature map, wherein the location of the local maxima in each channel of the N channel feature map corresponds to centroid positions of the plurality of objects of that semantic category forming the center map. 6 . The method of claim 3 , wherein the determining the position of each object of the plurality of objects in the scene from the center map comprises: reshaping the at least one input frame by pre-processing the at least one input frame based on neural network input parameters, wherein the neural network input parameters comprise at least one of a channel dimension of input frame, a spatial resolution of input frame, and processing details; inputting the reshaped at least one input frame into a pyramidal based neural network model to generate a set of features from pyramid levels; combining the set of features from the pyramid levels to form aggregated features; passing the aggregated features through a center mask to generate semantically aware center map of shape of each object of the plurality of objects in the scene; and determining, based on the semantically aware center map, the position of each object of the plurality of objects in the scene by encoding a confidence of each position having a center of an object for each semantic category of the semantic categories. 7 . The method of claim 1 , wherein the determining the shape of each object of the plurality of objects in the scene using the pre-trained neural network model comprises: generating a prototype map using the pre-trained neural network model, wherein the prototype map produces a fixed number of object shape aware feature maps, which act as prototypes for final object instances; and determining the position of each object of the plurality of objects in the scene from the prototype map. 8 . The method of claim 7 , wherein the determining the position of each object of the plurality of objects in the scene from the prototype map comprises: reshaping by pre-processing the at least one input frame based on neural network input parameters; inputting the reshaped at least one input frame into a pyramidal based neural network model to generate a set of features from pyramid levels; combining, the set of features from the pyramid levels to form aggregated features; and determining the position of each object of the plurality of objects in the scene by passing the aggregated features through a prototype mask to generate a plurality of shape aware prototype masks for each center in the at least one input frame. 9 . The method of claim 1 , wherein the determining the array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model comprises: determining a first array of coefficients for a first object of the plurality of objects in the scene; and determining a second array of coefficients for a second object of the plurality of objects in the scene. 10 . The method of claim 1 , wherein the inputting the at least one input frame of the scene into the pre-trained neural network model comprises: displaying the scene in a preview field of at least one imaging sensor of the electronic device; obtaining the at least one input frame of the scene using the at least one imaging sensor; and inputting the at least one input frame of the scene into the pre-trained neural network model. 11 . An electronic device for segmenting objects in a scene, the electronic device comprising: a memory; a display; an object segment controller communicatively coupled to the memory; and a processor configured to: input at least one input frame of the scene into a pre-trained neural network model, the scene comprising a plurality of objects; determine a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determine an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generate a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene, wherein the processor is further configured to: obtain semantically aware center maps and shape aware prototype masks associated with each object of the plurality of objects in the scene, determine a linear combination of the semantically aware center maps and the shape aware prototype masks weighted by corresponding coefficients of the array of coefficients on each center location, and generate the segment mask for each object of the plurality of objects based on the linear combination of the semantically aware center maps and the shape aware prototype masks. 12 .
Artificial neural networks [ANN] · CPC title
Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform · CPC title
using neural networks · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.