Pixel classification to reduce depth-estimation error
US-2021035303-A1 · Feb 4, 2021 · US
US11263810B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11263810-B2 |
| Application number | US-202016875779-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 15, 2020 |
| Priority date | Apr 19, 2018 |
| Publication date | Mar 1, 2022 |
| Grant date | Mar 1, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Optimizations are provided for reconstructing geometric surfaces for an environment that includes moving objects. Multiple depth maps for the environment are created, where some of the depth maps correspond to different perspectives of the environment. A motion state identifier is assigned to at least some pixels in at least some of the depth maps corresponding to moving objects in the environment. A composite 3D mesh is built using at least some of the multiple depth maps, by incorporating pixel information from the depth maps, while omitting pixel information identified by the motion state identifiers as being associated with moving objects.
Opening claim text (preview).
What is claimed is: 1. A computer system configured to facilitate improvements in how surface reconstruction of an environment is performed, said computer system comprising: one or more processors; and one or more computer-readable hardware storage devices storing instructions that are executable by the one or more processors to cause the computer system to at least: obtain images of a real-world environment, at least two of the images being generated at different points in time; provide the images as input to a machine learning (ML) algorithm, the ML algorithm being trained to classify image objects as dynamic or static; identify that the ML algorithm classified a substantially stationary object embodied in the at least two images as being dynamic even though any movement detected for the stationary object, as detected between the at least two images, falls below and thereby satisfies a maximum movement threshold used for determining whether objects are potentially static; based on one or more of the at least two images, generate a depth map that includes depth identifying pixels, each one of said pixels being assigned a corresponding motion state identifier indicating whether each one of said pixels is reflective of a corresponding dynamic object or a corresponding static object, wherein a group of said pixels corresponds to the stationary object and are assigned motion state identifiers reflecting the stationary object as being dynamic; and based at least partially on the depth map, generate a three-dimensional (3D) mesh of the real-world environment, said generating being performed by including depth information from pixels having motion state identifiers corresponding to static objects while omitting depth information from pixels having motion state identifiers corresponding to dynamic objects, and such that depth information corresponding to the stationary object is omitted from the 3D mesh even though said any movement detected for the stationary object is determined to fall below the maximum movement threshold. 2. The computer system of claim 1 , wherein image objects classified as dynamic are determined to satisfy a volatility degree while image objects classified as static are determined to not satisfy the volatility degree. 3. The computer system of claim 1 , wherein pose information is also provided as input to the ML algorithm. 4. The computer system of claim 1 , wherein the ML algorithm generates, as output, a label map detailing whether objects are dynamic or static. 5. The computer system of claim 1 , wherein assigning motion state identifiers includes performing skeleton tracking to classify objects. 6. The computer system of claim 1 , wherein morphological dilation is performed to generate a buffer surrounding the stationary object. 7. The computer system of claim 6 , wherein depth information for the buffer is also refrained from being included in the 3D mesh. 8. The computer system of claim 1 , wherein multiple depth maps are used to generate the 3D mesh. 9. The computer system of claim 1 , wherein motion state identifiers are Boolean values. 10. The computer system of claim 1 , wherein a confidence level is included as a part of each motion state identifier for each pixel of the depth map, said confidence level indicating a level of confidence regarding whether that pixel's corresponding object is dynamic or static. 11. A method for facilitating improvements in how surface reconstruction of an environment is performed, said method comprising: obtaining images of a real-world environment, at least two of the images being generated at different points in time; providing the images as input to a machine learning (ML) algorithm, the ML algorithm being trained to classify image objects as dynamic or static; identifying that the ML algorithm classified a substantially stationary object embodied in the at least two images as being dynamic even though any movement detected for the stationary object, as detected between the at least two images, falls below and thereby satisfies a maximum movement threshold used for determining whether objects are potentially static; based on one or more of the at least two images, generating a depth map that includes depth identifying pixels, each one of said pixels being assigned a corresponding motion state identifier indicating whether each one of said pixels is reflective of a corresponding dynamic object or a corresponding static object, wherein a group of said pixels corresponds to the stationary object and are assigned motion state identifiers reflecting the stationary object as being dynamic; and based at least partially on the depth map, generating a three-dimensional (3D) mesh of the real-world environment, said generating being performed by including depth information from pixels having motion state identifiers corresponding to static objects while omitting depth information from pixels having motion state identifiers corresponding to dynamic objects, and such that depth information corresponding to the stationary object is omitted from the 3D mesh even though said any movement detected for the stationary object is determined to fall below the maximum movement threshold. 12. The method of claim 11 , wherein the images capture different perspectives of the real-world environment. 13. The method of claim 12 , wherein, to capture the different perspectives of the real-world environment, cameras used to generate the images are physically positioned at different locations within the real-world environment. 14. The method of claim 12 , wherein, to capture the different perspectives of the real-world environment, re-projections are performed on one or more of the images to obtain one of more of the different perspectives. 15. The method of claim 11 , wherein image objects classified as dynamic are determined to satisfy a volatility degree while image objects classified as static are determined to not satisfy the volatility degree. 16. The method of claim 11 , wherein pose information is also provided as input to the ML algorithm. 17. The method of claim 11 , wherein the ML algorithm generates, as output, a label map detailing whether objects are dynamic or static. 18. The method of claim 11 , wherein assigning motion state identifiers includes performing skeleton tracking to classify objects. 19. The method of claim 11 , wherein morphological dilation is performed to generate a buffer surrounding the object. 20. A computer system comprising: one or more processors; and one or more computer-readable hardware storage devices that store computer-executable instructions that are executable by the one or more processors to cause the computer system to at least: obtain images of a real-world environment, at least two of the images being generated at different points in time, wherein the images of the real-world environment include one or more of a visible light image or an infrared light image; provide the images as input to a machine learning (ML) algorithm, the ML algorithm being trained to classify image objects as dynamic or static; identify that the ML algorithm classified a substantially stationary object embodied in the at least two images as being dynamic even though any movement detected for the stationary object, as detected between the at least two images, falls below and thereby satisfies a maximum movement threshold used for determining whether objects are potentially static; based on one or more of the at least two images, generate a depth map that inc
Image-based rendering · CPC title
Aligning objects, relative positioning of parts · CPC title
Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
General purpose rendering architectures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.