3d object segmentation
US-2016196659-A1 · Jul 7, 2016 · US
US10373380B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10373380-B2 |
| Application number | US-201615046614-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 18, 2016 |
| Priority date | Feb 18, 2016 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for 3D analysis of a scene including detection, segmentation and registration of objects within the scene. The analysis results may be used to implement augmented reality operations including removal and insertion of objects and the generation of blueprints. An example method may include receiving 3D image frames of the scene, each frame associated with a pose of a depth camera, and creating a 3D reconstruction of the scene based on depth pixels that are projected and accumulated into a global coordinate system. The method may also include detecting objects, and associated locations within the scene, based on the 3D reconstruction, the camera pose and the image frames. The method may further include segmenting the detected objects into points of the 3D reconstruction corresponding to contours of the object and registering the segmented objects to 3D models of the objects to determine their alignment.
Opening claim text (preview).
What is claimed is: 1. A processor-implemented method for 3-Dimensional (3D) scene analysis, the method comprising: receiving, by a processor, a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames; projecting, by the processor, the depth pixels into points in a global coordinate system based on the camera pose; accumulating, by the processor, the projected points into a 3D reconstruction of the scene; detecting, by the processor, objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame; associating a label with the detected object; calculating a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box; matching the detected object to an existing object boundary set created from a previously received 3D image frame, the matching based on the label and the 3D location of the center of the 2D bounding box; segmenting, by the processor, each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and registering, by the processor, the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene. 2. The method of claim 1 , further comprising deleting a selected object from the scene by: capturing a new RGB image frame that includes the selected object; generating a 2D mask based on the camera pose associated with the new RGB image frame and the registration corresponding to the selected object; replacing pixels associated with the selected object within the 2D mask, with values based on pixels associated with neighboring regions in the new RGB image frame; and applying the mask to the new RGB image frame. 3. The method of claim 1 , further comprising adding a selected object to the scene by: capturing a new RGB image frame that includes a region where the selected object is to be added; generating a 2D RGB image of the selected object based on the camera pose associated with the new RGB image frame and a 3D model of the selected object; and rendering the 2D RGB image of the selected object onto the new RGB image frame. 4. The method of claim 1 , further comprising generating a blueprint of the scene based on the registered objects and the associated locations of the detected objects. 5. The method of claim 1 , wherein each pose of the depth camera is calculated by one of: using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on the depth pixels of the depth map frame; or using a Simultaneous Localization and Mapping (SLAM) operation performed on the color pixels of the RGB image frame; or based on data provided by inertial sensors of the depth camera. 6. The method of claim 1 , wherein the object detection is based on at least one of template matching, classification using a bag-of-words vision model, and classification using a convolutional neural network. 7. The method of claim 1 , wherein the object segmentation is based on detecting and removing surface planes from the scene to generate a processed scene; and performing a connected component clustering operation on the processed scene to generate the segmented objects. 8. The method of claim 1 , wherein the object segmentation further comprises: in response to a failure of the matching, creating a new object boundary set associated with the detected object, wherein the new object boundary set comprises 3D positions of pixels in the 2D bounding box corresponding to the boundary of the object, and further comprises vectors associated with the pixels, the vectors specifying a ray from the position of the depth camera associated with the corresponding pose, to each of the pixels; and adjusting the new object boundary set to remove duplicate pixels generated from different poses of the depth camera, the removal based on the distance of the pixels from the camera and further based on the direction of the associated vectors. 9. A system for 3-Dimensional (3D) scene analysis, the system comprising: a 3D reconstruction circuit to receive a plurality of 3D image frames of a scene, each frame comprising a red-green-blue (RGB) image frame comprising color pixels and a depth map frame comprising depth pixels, wherein each of the 3D image frames is associated with a pose of a depth camera that generated the 3D image frames, the 3D reconstruction circuit further to project the depth pixels into points in a global coordinate system based on the camera pose and accumulate the projected points into a 3D reconstruction of the scene; an object detection circuit to: detect objects and associated locations in the scene, for each 3D image frame, based on the camera pose, the 3D reconstruction, the RGB image frame and the depth map frame; associate a label with the detected object; calculate a 2-Dimensional (2D) bounding box containing the detected object, and a 3D location of the center of the 2D bounding box; and match the detected object to an existing object boundary set created from a previously received 3D image frame, the match based on the label and the 3D location of the center of the 2D bounding box; a 3D segmentation circuit to segment each of the detected objects in the scene, the segmented objects comprising the points of the 3D reconstruction corresponding to contours of the associated detected object; and a 3D registration circuit to register the segmented objects to a 3D model of the associated detected object to determine an alignment of the detected object in the scene. 10. The system of claim 9 , further comprising an augmented reality (AR) manipulation circuit to delete a selected object from the scene by: capturing a new RGB image frame that includes the selected object; generating a 2D mask based on the camera pose associated with the new RGB image frame and the registration corresponding to the selected object; replacing pixels associated with the selected object within the 2D mask, with values based on pixels associated with neighboring regions in the new RGB image frame; and applying the mask to the new RGB image frame. 11. The system of claim 9 , further comprising an AR manipulation circuit to add a selected object to the scene by: capturing a new RGB image frame that includes a region where the selected object is to be added; generating a 2D RGB image of the selected object based on the camera pose associated with the new RGB image frame and a 3D model of the selected object; and rendering the 2D RGB image of the selected object onto the new RGB image frame. 12. The system of claim 9 , further comprising an AR manipulation circuit to generate a blueprint of the scene based on the registered objects and the associated locations of the detected objects. 13. The system of claim 9 , wherein each pose of the depth camera is calculated by one of: using a transformation of the camera based on an Iterative Closest Point (ICP) matching operation performed on the depth pixels of the depth map frame; or using a Simultaneous Localization and Mapping (SLAM) operation performed on the color pixels of the RGB image frame; or based on data provided by inertial sensors of the depth camera, and wherein the object detection is based on at least one of templat
Stereoscopic video; Stereoscopic image sequence · CPC title
Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title
Color image · CPC title
Mixed reality (object pose determination, tracking or camera calibration for mixed reality G06T7/00) · CPC title
Range image; Depth image; 3D point clouds · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.