Network infrastructure for user-specific generative intelligence
US-2024420491-A1 · Dec 19, 2024 · US
US2026017800A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2026017800-A1 |
| Application number | US-202519095449-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 31, 2025 |
| Priority date | Jul 15, 2024 |
| Publication date | Jan 15, 2026 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method is provided. The aspects include deriving, by one or more processors from a pixel-wise semantic segmentation of at least two image frames, object classifications representing one or more objects in the at least two image frames. The aspects further include deriving, by the one or more processors, geometric points representing the one or more objects in the at least two image frames. The aspects also include merging, by the one or more processors, the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects. The aspects additionally include controlling movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method, comprising: deriving, by one or more processors from a pixel-wise semantic segmentation of at least two different image frames, object classifications representing one or more objects in the at least two different image frames; deriving, by the one or more processors, geometric points representing the one or more objects in the at least two different image frames; merging, by the one or more processors, the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects; and sending, by the one or more processors, instructions to control a movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points. 2 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged in a process that is restricted to merging only objects in a same class. 3 . The computer-implemented method in accordance with claim 2 , wherein restricted to merging only objects in the same class skips any of the one or more objects that are in other classes from a particular merging of a given class. 4 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged at a point cloud level. 5 . The computer-implemented method in accordance with claim 4 , wherein point cloud data for the at least one of the one or more objects in the at least different two image frames are mergeable only when the point cloud data for the at least one of the one or more objects in the at least different two image frames include an overlap by a threshold amount with respect to the mutually closest geometric point metric. 6 . The computer-implemented method in accordance with claim 5 , wherein the threshold amount is user adjustable. 7 . The computer-implemented method in accordance with claim 5 , further comprising determining the overlap using at least one respective mask for each the one or more objects in the at least two different image frames. 8 . The computer-implemented method in accordance with claim 1 , further comprising comparing pairs of the geometric points in different ones of the at least two different frames to identify pairs of mutually closest points in the at least two different frames for overlap evaluation. 9 . The computer-implemented method in accordance with claim 1 , further comprising performing the semantic segmentation using a closed set comprising segmentations and labels for the segmentations. 10 . The computer-implemented method in accordance with claim 9 , wherein the segmentations comprise X segmentations and the labels comprise Y labels, and wherein X and Y are integers greater than one and capable of being any of equal or different. 11 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged further based on depth data. 12 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are merged further based on camera pose data. 13 . The computer-implemented method in accordance with claim 12 , further comprising using the camera pose data to limit the geometric points that can be compared to each other for correspondence to have a same semantic label and to belong in a field-of-view of all camera poses under consideration. 14 . The computer-implemented method in accordance with claim 1 , wherein the geometric points are represented by image meshes and merged into scene meshes. 15 . The computer-implemented method in accordance with claim 1 , further comprising performing a voxel down-sampling operation by forming a grid over the geometric points, averaging all points with a same respective box of the grid to combine pixels in the same box of the grid into a resultant averaged pixel. 16 . The computer-implemented method in accordance with claim 1 , wherein controlling movement of the autonomous object comprises controlling movement of a robot to achieve the task responsive to the merged geometric points. 17 . The computer-implemented method in accordance with claim 16 , wherein the task comprises avoiding an obstacle. 18 . The computer-implemented method in accordance with claim 16 , wherein the task comprises moving an object from a first location to a second location. 19 . A pipeline, comprising: one or more processors operatively coupled to one or more memories and configured to derive, from a pixel-wise semantic segmentation of at least two different image frames, object classifications representing one or more objects in the at least two different image frames, derive geometric points representing the one or more objects in the at least two different image frames, merge the geometric points based on the object classifications that match and a mutually closest geometric point metric to obtain merged geometric points for each of the one or more objects, and send instructions to control a movement of an autonomous object to achieve a task responsive to at least one of the one or more objects represented by the merged geometric points. 20 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to implement a semantic segmentation branch and perform semantic segmentation on an image frame to output segmentations of the image frame and class labels for the segmentations from a closed set of segmentations and class labels, responsive to red, green, blue (RGB) image data. 21 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to implement a mask branch configured to perform mask-based segmentation to output mask-based segmentations without class labels. 22 . The pipeline in accordance with claim 21 , wherein the one or more processors are further configured to perform semantic voting to output final segmentations with a finer granularity than the mask-based segmentation with final class labels, responsive to inputs comprising the segmentations and the labels for the segmentations output from the semantic segmentation and the mask-based segmentations output from the mask-based segmentation. 23 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configued to generate three-dimensional (3D) scenes from perception that comprises color and depth information. 24 . The pipeline in accordance with claim 19 , wherein one or more processors are further configured to perform, along with the semantic segmentation, meshing, and scene integration. 25 . The pipeline in accordance with claim 19 , wherein the semantic segmentation is combined with a traditional Segment Anything Model to output fine-grained segmentations with class labels by exploiting a fine grained pipeline providing the fine grained segmentations with the semantic segmentation providing coarse grained segmentations and labels for the coarse grained segmentations applicable to the fine grained segmentations. 26 . The pipeline in accordance with claim 19 , wherein the one or more processors are further configured to perform point cloud merging by leveraging segmentations per semantic class to limit overlap evaluation to be between a same semantic mask of frames.
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
using classification, e.g. of video objects · CPC title
Avoiding collision or forbidden zones · CPC title
Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title
Edge-based segmentation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.