Method and electronic device for segmenting objects in scene

US12482106B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12482106-B2
Application numberUS-202217983119-A
CountryUS
Kind codeB2
Filing dateNov 8, 2022
Priority dateOct 21, 2021
Publication dateNov 25, 2025
Grant dateNov 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for segmenting objects in a scene by an electronic device is provided. The method includes inputting at least one input frame of the scene into a pre-trained neural network model, the scene including a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generating a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for segmenting objects in a scene by an electronic device, the method comprising: inputting at least one input frame of the scene into a pre-trained neural network model, the scene comprising a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generating a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene, wherein the generating the segment mask for each object of the plurality of objects comprises: obtaining semantically aware center maps and shape aware prototype masks associated with each object of the plurality of objects in the scene, determining a linear combination of the semantically aware center maps and the shape aware prototype masks weighted by corresponding coefficients of the array of coefficients on each center location, and generating the segment mask for each object of the plurality of objects based on the linear combination of the semantically aware center maps and the shape aware prototype masks. 2 . The method of claim 1 , wherein the method further comprises displaying the segment mask for each object in the scene that segments overlapping objects of the plurality of objects in the scene. 3 . The method of claim 1 , wherein the determining the position of each object of the plurality of objects in the scene using the pre-trained neural network model comprises: generating a center map using the pre-trained neural network model, wherein the center map comprises N channels that corresponds to a number of semantic categories representing each object in the scene; and determining the position of each object of the plurality of objects in the scene based on the center map. 4 . The method of claim 3 , wherein the generating the center map comprises: inputting the at least one input frame of the scene to the pre-trained neural network model and obtaining an N channel feature map as an output from the pre-trained neural network model, wherein N corresponds to a number of semantic categories that are supported; and obtaining the center map by predicting, based on the N channel feature map, center positions of each object of the plurality of objects in the at least one input frame input to the pre-trained neural network model. 5 . The method of claim 4 , wherein the predicting the center positions of each object of the plurality of objects comprises: locating a local maxima by suppressing local minimum areas and capturing only local maximums for each channel of the N channel feature map, wherein the location of the local maxima in each channel of the N channel feature map corresponds to centroid positions of the plurality of objects of that semantic category forming the center map. 6 . The method of claim 3 , wherein the determining the position of each object of the plurality of objects in the scene from the center map comprises: reshaping the at least one input frame by pre-processing the at least one input frame based on neural network input parameters, wherein the neural network input parameters comprise at least one of a channel dimension of input frame, a spatial resolution of input frame, and processing details; inputting the reshaped at least one input frame into a pyramidal based neural network model to generate a set of features from pyramid levels; combining the set of features from the pyramid levels to form aggregated features; passing the aggregated features through a center mask to generate semantically aware center map of shape of each object of the plurality of objects in the scene; and determining, based on the semantically aware center map, the position of each object of the plurality of objects in the scene by encoding a confidence of each position having a center of an object for each semantic category of the semantic categories. 7 . The method of claim 1 , wherein the determining the shape of each object of the plurality of objects in the scene using the pre-trained neural network model comprises: generating a prototype map using the pre-trained neural network model, wherein the prototype map produces a fixed number of object shape aware feature maps, which act as prototypes for final object instances; and determining the position of each object of the plurality of objects in the scene from the prototype map. 8 . The method of claim 7 , wherein the determining the position of each object of the plurality of objects in the scene from the prototype map comprises: reshaping by pre-processing the at least one input frame based on neural network input parameters; inputting the reshaped at least one input frame into a pyramidal based neural network model to generate a set of features from pyramid levels; combining, the set of features from the pyramid levels to form aggregated features; and determining the position of each object of the plurality of objects in the scene by passing the aggregated features through a prototype mask to generate a plurality of shape aware prototype masks for each center in the at least one input frame. 9 . The method of claim 1 , wherein the determining the array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model comprises: determining a first array of coefficients for a first object of the plurality of objects in the scene; and determining a second array of coefficients for a second object of the plurality of objects in the scene. 10 . The method of claim 1 , wherein the inputting the at least one input frame of the scene into the pre-trained neural network model comprises: displaying the scene in a preview field of at least one imaging sensor of the electronic device; obtaining the at least one input frame of the scene using the at least one imaging sensor; and inputting the at least one input frame of the scene into the pre-trained neural network model. 11 . An electronic device for segmenting objects in a scene, the electronic device comprising: a memory; a display; an object segment controller communicatively coupled to the memory; and a processor configured to: input at least one input frame of the scene into a pre-trained neural network model, the scene comprising a plurality of objects; determine a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determine an array of coefficients for pixels associated with each object of the plurality of objects in the scene using the pre-trained neural network model; and generate a segment mask for each object of the plurality of objects based on the position, the shape, and the array of coefficients for each object of the plurality of objects in the scene, wherein the processor is further configured to: obtain semantically aware center maps and shape aware prototype masks associated with each object of the plurality of objects in the scene, determine a linear combination of the semantically aware center maps and the shape aware prototype masks weighted by corresponding coefficients of the array of coefficients on each center location, and generate the segment mask for each object of the plurality of objects based on the linear combination of the semantically aware center maps and the shape aware prototype masks. 12 .

Assignees

Inventors

Classifications

  • Artificial neural networks [ANN] · CPC title

  • Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform · CPC title

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12482106B2 cover?
A method for segmenting objects in a scene by an electronic device is provided. The method includes inputting at least one input frame of the scene into a pre-trained neural network model, the scene including a plurality of objects; determining a position and a shape of each object of the plurality of objects in the scene using the pre-trained neural network model; determining an array of coeff…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).