Systems and methods for unsupervised learning of geometry from images using depth-normal consistency
US-2019139179-A1 · May 9, 2019 · US
US10885659B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10885659-B2 |
| Application number | US-201816161243-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 16, 2018 |
| Priority date | Jan 15, 2018 |
| Publication date | Jan 5, 2021 |
| Grant date | Jan 5, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed is an object pose estimating method and apparatus. The pose estimating method includes acquiring a two-dimensional (2D) image corresponding to an object, extracting a global visual feature and a local geometric feature of the object in the 2D image, and estimating a three-dimensional (3D) pose of the object based on the global visual feature and the local geometric feature.
Opening claim text (preview).
What is claimed is: 1. A processor implemented pose estimating method, comprising: acquiring a two-dimensional (2D) image corresponding to an object; extracting a global visual feature and a local geometric feature of the object in the 2D image; and estimating a three-dimensional (3D) pose of the object based on the global visual feature and the local geometric feature, wherein the extracting comprises: extracting a first feature based on the 2D image and depth information of the 2D image; extracting a second feature based on the 2D image; and extracting the global visual feature by applying a feature approximation strategy to the first feature and the second feature. 2. The pose estimating method of claim 1 , wherein the global visual feature is a visual feature of the object in its entirety and the local geometric feature is a geometric feature of a portion of the object. 3. The pose estimating method of claim 2 , wherein another portion of the object in the geometric feature is occluded or truncated. 4. The pose estimating method of claim 1 , wherein the acquiring comprises acquiring the 2D image by performing object region segmentation on an image. 5. The pose estimating method of claim 1 , wherein the local geometric feature includes a local key component of the object or a key point of the object. 6. The pose estimating method of claim 1 , wherein the extracting comprises: extracting the global visual feature of the 2D image through a first deep learning network; and extracting the local geometric feature of the 2D image through a second deep learning network. 7. The pose estimating method of claim 6 , wherein the extracting of the global visual feature through the first deep learning network comprises: training a third deep learning network based on the 2D image and the depth information of the 2D image; and applying the feature approximation strategy to the first deep learning network based on the 2D image and an output of the third deep learning network. 8. The pose estimating method of claim 7 , wherein the applying comprises: calculating a loss term which is a difference between a feature obtained from the first deep learning network and a feature obtained from the third deep learning network; and approximating the feature obtained from the first deep learning network to the feature obtained from the third deep learning network based on the loss term. 9. The pose estimating method of claim 6 , wherein the extracting of the local geometric feature through the second deep learning network comprises: training a fourth deep learning network based on the 2D image and pixel information of the 2D image; and applying a feature approximation strategy to the second deep learning network based on the 2D image and an output of the fourth deep learning network. 10. The pose estimating method of claim 9 , wherein the applying comprises: calculating a loss term which is a difference between a feature obtained from the second deep learning network and a feature obtained from the fourth deep learning network; and approximating the feature obtained from the second deep learning network to the feature obtained from the fourth deep learning network based on the loss term. 11. A processor implemented depth image generating method, comprising: acquiring dense depth images corresponding to a plurality of objects based on the 3D pose estimated by the pose estimating method of claim 1 ; and generating a dense depth value of the 2D image by integrating the dense depth images. 12. A pose estimating apparatus, comprising: a receiver configured to receive an image; and a controller configured to: acquire a two-dimensional (2D) image corresponding to an object from the image, extract a global visual feature and a local geometric feature of the object in the 2D image, and estimate a three-dimensional (3D) pose of the object based on the global visual feature and the local geometric feature, wherein the controller further configured to: extract a first feature based on the 2D image and depth information of the 2D image; extract a second feature based on the 2D image; and extract the global visual feature by applying a feature approximation strategy to the first feature and the second feature. 13. The pose estimating apparatus of claim 12 , wherein the global visual feature is a visual feature of the object in its entirety and the local geometric feature is a geometric feature of a portion of the object. 14. The pose estimating apparatus of claim 13 , wherein another portion of the object in the geometric feature is occluded or truncated. 15. The pose estimating apparatus of claim 12 , wherein the controller is further configured to acquire the 2D image by performing object region segmentation on the image. 16. The pose estimating apparatus of claim 12 , wherein the local geometric feature includes a local key component of the object or a key point of the object. 17. The pose estimating apparatus of claim 12 , wherein the controller is further configured to extract the global visual feature of the 2D image through a first deep learning network, and extract the local geometric feature of the 2D image through a second deep learning network. 18. The pose estimating apparatus of claim 17 , wherein the controller is further configured to train a third deep learning network based on the 2D image and the depth information of the 2D image, and apply the feature approximation strategy to the first deep learning network based on the 2D image and an output of the third deep learning network. 19. The pose estimating apparatus of claim 18 , wherein the controller is further configured to calculate a loss term which is a difference between a feature obtained from the first deep learning network and a feature obtained from the third deep learning network, and approximate the feature obtained from the first deep learning network to the feature obtained from the third deep learning network based on the loss term. 20. The pose estimating apparatus of claim 17 , wherein the controller is further configured to train a fourth deep learning network based on the 2D image and pixel information of the 2D image, and apply a feature approximation strategy to the second deep learning network based on the 2D image and an output of the fourth deep learning network. 21. The pose estimating apparatus of claim 20 , wherein the controller is further configured to calculate a loss term which is a difference between a feature obtained from the second deep learning network and a feature obtained from the fourth deep learning network, and approximate the feature obtained from the second deep learning network to the feature obtained from the fourth deep learning network based on the loss term. 22. The pose estimating apparatus of claim 12 , further comprising: a depth image generator configured to acquire dense depth images corresponding to a plurality of objects based on the 3D pose, and generate a dense depth value of the image by integrating the dense depth images.
Combinations of networks · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Still image; Photographic image · CPC title
Depth or shape recovery · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.