Manhattan layout estimation using geometric and semantic information

US12505618B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12505618-B2
Application numberUS-202217981156-A
CountryUS
Kind codeB2
Filing dateNov 4, 2022
Priority dateFeb 2, 2022
Publication dateDec 23, 2025
Grant dateDec 23, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A plurality of two-dimensional (2D) images of the scene is received. Geometric information and semantic information of each of the plurality of 2D images is determined. The geometric information indicates a detected line and a reference direction in the respective 2D image. The semantic information includes classification information of pixels in the respective 2D image. A layout estimation associated with the respective 2D image of the scene is determined based on the geometric information and the semantic information of the respective 2D image. A combined layout estimation associated with the scene is determined based on a plurality of the determined layout estimations associated with the plurality of 2D images of the scene. The Manhattan layout associated with the scene is generated based on the combined layout estimation. The Manhattan layout includes at least a three-dimensional (3D) shape of the scene that includes wall faces orthogonal with respect to each other.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for estimating a Manhattan layout associated with a scene, the method comprising: receiving a plurality of two-dimensional (2D) images of the scene; determining geometric information and semantic information of each of the plurality of 2D images, the geometric information indicating a detected line and a reference direction in the respective 2D image, the semantic information including classification information of pixels in the respective 2D image; determining a plurality of layout estimations associated with the plurality of 2D images of the scene, each layout estimation being associated with the respective 2D image of the scene based on the geometric information and the semantic information of the respective 2D image; determining a combined layout estimation associated with the scene based on a shrunk polygon that is generated based on a plurality of candidate edges, the shrunk polygon being a portion of a base polygon, the base polygon being formed by combining the plurality of the determined layout estimations associated with the plurality of 2D images of the scene, which one of the plurality of candidate edges is selected for the shrunk polygon being determined based on whether (i) the one of the plurality of candidate edges is parallel to a corresponding edge of the base polygon, (ii) a projected overlapping portion between the one of the plurality of candidate edges and the corresponding edge of the base polygon is larger than a threshold, and (iii) the one of the plurality of candidate edges is closer to an original view position associated with a corresponding 2D image than the corresponding edge of the base polygon; and generating the Manhattan layout associated with the scene based on the combined layout estimation, the Manhattan layout including at least a three-dimensional (3D) shape of the scene that includes wall faces orthogonal with respect to each other. 2 . The method of claim 1 , wherein the determining the geometric information and the semantic information further comprises: extracting first geometric information of a first 2D image of the plurality of 2D images, the first geometric information including at least one of detected lines, reference directions of the first 2D image, a ratio of a first distance from a ceiling to a ground and a second distance from a camera to the ground, or a relative pose between the first 2D image and a second 2D image of the plurality of 2D images; and labeling pixels of the first 2D image to generate first semantic information, the first semantic information indicating first structure information of the pixels in the first 2D image. 3 . The method of claim 2 , wherein the determining the layout estimation associated with the respective 2D image of the scene further comprises: determining a first layout estimation of the plurality of the determined layout estimations associated with the scene based on the first geometric information and the first semantic information of the first 2D image; and the determining the first layout estimation further comprises: determining whether each of the detected lines is a borderline that corresponds to a wall border in the scene; aligning the borderlines of the detected lines with the reference directions of the first 2D image; and generating a first polygon that indicates the first layout estimation based on the aligned borderlines with one of a 2D polygon denoising and a staircase removal. 4 . The method of claim 3 , wherein the generating the first polygon further comprises completing a plurality of incomplete borderlines of the borderlines based on one of: estimating the plurality of incomplete borderlines based on a combination of a ceiling borderline and a floor borderline of the borderlines; and connecting a pair of incomplete borderlines of the plurality of incomplete borderlines based on one of (i) adding a perpendicular line to the pair of incomplete borderlines when the pair of incomplete borderlines are parallel and (ii) extending at least one of the pair of incomplete borderlines such that an intersection of the pair of incomplete borderlines is positioned on the extended pair of incomplete borderlines. 5 . The method of claim 3 , wherein the determining the combined layout estimation associated with the scene further comprises: determining the base polygon by combining a plurality of polygons via a polygon union algorithm, each of the plurality of polygons corresponding to a respective layout estimation of the plurality of the determined layout estimations; determining the shrunk polygon based on the base polygon, the shrunk polygon including updated edges that are updated from edges of the base polygon; and determining a final polygon based on the shrunk polygon with one of the 2D polygon denoising and the staircase removal, the final polygon corresponding to the combined layout estimation associated with the scene. 6 . The method of claim 5 , wherein the determining the shrunk polygon further comprises: determining the plurality of candidate edges from the plurality of polygons for the edges of the base polygon, each of the plurality of candidate edges corresponding to a respective edge of the base polygon; and generating the updated edges of the shrunk polygon by replacing one or more edges of the base polygon with the corresponding one or more candidate edges when the one or more candidate edges are closer to original view positions in the plurality of 2D images than the corresponding one or more edges of the base polygon. 7 . The method of claim 5 , wherein the determining the combined layout estimation associated with the scene further comprises: determining an edge set that includes edges of the final polygon; generating a plurality of edge groups based on the edge set; and generating a plurality of internal edges of the final polygon that is indicated by a plurality of average edges of one or more edge groups of the edge set, each of the one or more edge groups of the plurality of edge groups including a respective number of edges that is greater than a target value, each of the plurality of average edges being obtained by averaging edges of a respective one of the one or more edge groups. 8 . The method of claim 7 , wherein: the plurality of edge groups includes a first edge group, and the first edge group further includes a first edge and a second edge, the first edge and the second edge being parallel, a distance between the first edge and the second edge being less than a first threshold, and a projected overlapping region between the first edge and the second edge being greater than a second threshold. 9 . The method of claim 1 , wherein the generating the Manhattan layout associated with the scene further comprises: generating the Manhattan layout associated with the scene based on one of triangle meshes triangulated from the combined layout estimation, quadrilateral meshes quadrangulated from the combined layout estimation, sampling points sampled from one of the triangle meshes and the quadrilateral meshes, or discrete grids generated from one of the triangle meshes and the quadrilateral meshes via voxelization. 10 . The method of claim 9 , wherein: the Manhattan layout associated with the scene is generated based on the triangle meshes triangulated from the combined layout estimation, and the generating the Manhattan layout associated with the scene further comprises: generating a ceiling face and a floor face in the scene by triangulating the combined layout estimation; generating the wall faces in the scene by triangulating rectangles that surround a ceiling borderline and a floor borderline in the scene; and g

Assignees

Inventors

Classifications

  • G06T11/23Primary

    using straight lines or curves · CPC title

  • from multiple images · CPC title

  • using feature-based methods · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Ray-tracing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12505618B2 cover?
A plurality of two-dimensional (2D) images of the scene is received. Geometric information and semantic information of each of the plurality of 2D images is determined. The geometric information indicates a detected line and a reference direction in the respective 2D image. The semantic information includes classification information of pixels in the respective 2D image. A layout estimation ass…
Who is the assignee on this patent?
Tencent America LLC
What technology area does this patent fall under?
Primary CPC classification G06T11/23. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 23 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).