Semantic segmentation and scene integration of 3d image frames

US2026017800A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026017800-A1
Application numberUS-202519095449-A
CountryUS
Kind codeA1
Filing dateMar 31, 2025
Priority dateJul 15, 2024
Publication dateJan 15, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method is provided. The aspects include deriving, by one or more processors from a pixel-wise semantic segmentation of at least two image frames, object classifications representing one or more objects in the at least two image frames. The aspects further include deriving, by the one or more processors, geometric points representing the one or more objects in the at least two image frames. The aspects also include merging, by the one or more processors, the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects. The aspects additionally include controlling movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: deriving, by one or more processors from a pixel-wise semantic segmentation of at least two different image frames, object classifications representing one or more objects in the at least two different image frames; deriving, by the one or more processors, geometric points representing the one or more objects in the at least two different image frames; merging, by the one or more processors, the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects; and sending, by the one or more processors, instructions to control a movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points. 2 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged in a process that is restricted to merging only objects in a same class. 3 . The computer-implemented method in accordance with claim 2 , wherein restricted to merging only objects in the same class skips any of the one or more objects that are in other classes from a particular merging of a given class. 4 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged at a point cloud level. 5 . The computer-implemented method in accordance with claim 4 , wherein point cloud data for the at least one of the one or more objects in the at least different two image frames are mergeable only when the point cloud data for the at least one of the one or more objects in the at least different two image frames include an overlap by a threshold amount with respect to the mutually closest geometric point metric. 6 . The computer-implemented method in accordance with claim 5 , wherein the threshold amount is user adjustable. 7 . The computer-implemented method in accordance with claim 5 , further comprising determining the overlap using at least one respective mask for each the one or more objects in the at least two different image frames. 8 . The computer-implemented method in accordance with claim 1 , further comprising comparing pairs of the geometric points in different ones of the at least two different frames to identify pairs of mutually closest points in the at least two different frames for overlap evaluation. 9 . The computer-implemented method in accordance with claim 1 , further comprising performing the semantic segmentation using a closed set comprising segmentations and labels for the segmentations. 10 . The computer-implemented method in accordance with claim 9 , wherein the segmentations comprise X segmentations and the labels comprise Y labels, and wherein X and Y are integers greater than one and capable of being any of equal or different. 11 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged further based on depth data. 12 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged further based on camera pose data. 13 . The computer-implemented method in accordance with claim 12 , further comprising using the camera pose data to limit the geometric points that can be compared to each other for correspondence to have a same semantic label and to belong in a field-of-view of all camera poses under consideration. 14 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are represented by image meshes and merged into scene meshes. 15 . The computer-implemented method in accordance with claim 1 , further comprising performing a voxel down-sampling operation by forming a grid over the geometric points, averaging all points with a same respective box of the grid to combine pixels in the same box of the grid into a resultant averaged pixel. 16 . The computer-implemented method in accordance with claim 1 , wherein controlling movement of the autonomous object comprises controlling movement of a robot to achieve the task responsive to the merged geometric points. 17 . The computer-implemented method in accordance with claim 16 , wherein the task comprises avoiding an obstacle. 18 . The computer-implemented method in accordance with claim 16 , wherein the task comprises moving an object from a first location to a second location. 19 . A pipeline, comprising: one or more processors operatively coupled to one or more memories and configured to derive, from a pixel-wise semantic segmentation of at least two different image frames, object classifications representing one or more objects in the at least two different image frames, derive geometric points representing the one or more objects in the at least two different image frames, merge the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects, and send instructions to control a movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points. 20 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to implement a semantic segmentation branch and perform semantic segmentation on an image frame to output segmentations of the image frame and class labels for the segmentations from a closed set of segmentations and class labels, responsive to red, green, blue (RGB) image data. 21 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to implement a mask branch configured to perform mask-based segmentation to output mask-based segmentations without class labels. 22 . The pipeline in accordance with claim 21 , wherein the one or more processors are further configured to perform semantic voting to output final segmentations with a finer granularity than the mask-based segmentation with final class labels, responsive to inputs comprising the segmentations and the labels for the segmentations output from the semantic segmentation and the mask-based segmentations output from the mask-based segmentation. 23 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configued to generate three-dimensional (3D) scenes from perception that comprises color and depth information. 24 . The pipeline in accordance with claim 19 , wherein one or more processors are further configured to perform, along with the semantic segmentation, meshing, and scene integration. 25 . The pipeline in accordance with claim 19 , wherein the semantic segmentation is combined with a traditional Segment Anything Model to output fine-grained segmentations with class labels by exploiting a fine grained pipeline providing the fine grained segmentations with the semantic segmentation providing coarse grained segmentations and labels for the coarse grained segmentations applicable to the fine grained segmentations. 26 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to perform point cloud merging by leveraging segmentations per semantic class to limit overlap evaluation to be between a same semantic mask of frames.

Assignees

Inventors

Classifications

  • Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • G06V10/764Primary

    using classification, e.g. of video objects · CPC title

  • Avoiding collision or forbidden zones · CPC title

  • Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

  • G06T7/12Primary

    Edge-based segmentation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026017800A1 cover?
A computer-implemented method is provided. The aspects include deriving, by one or more processors from a pixel-wise semantic segmentation of at least two image frames, object classifications representing one or more objects in the at least two image frames. The aspects further include deriving, by the one or more processors, geometric points representing the one or more objects in the at least…
Who is the assignee on this patent?
Analog Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06V10/764. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 15 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).