Dense correspondence estimation with multi-level metric learning and hierarchical matching
US-2019066373-A1 · Feb 28, 2019 · US
US10740897B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10740897-B2 |
| Application number | US-201815871878-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 15, 2018 |
| Priority date | Sep 12, 2017 |
| Publication date | Aug 11, 2020 |
| Grant date | Aug 11, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present invention provide a method and a device for three-dimensional feature-embedded image object component-level semantic segmentation, the method includes: acquiring three-dimensional feature information of a target two-dimensional image; performing a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image. In the technical solution of the present application, not only the two-dimensional feature information of the image but also the three-dimensional feature information of the image are taken into consideration when performing the component-level semantic segmentation on the image, thereby improving the accuracy of the image component-level semantic segmentation.
Opening claim text (preview).
What is claimed is: 1. A method for three-dimensional feature-embedded image object component-level semantic segmentation, comprising: acquiring three-dimensional feature information of a target two-dimensional image; performing a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image; wherein the acquiring the three-dimensional feature information of the target two-dimensional image specifically comprises: acquiring a two-dimensional image corresponding to a respective three-dimensional model in a three-dimensional model library and a three-dimensional voxel model corresponding to the respective three-dimensional model in a three-dimensional model library; training a first neural network model by taking the respective three-dimensional voxel model as an input of the first neural network model, and a three-dimensional feature corresponding to the respective three-dimensional model as an ideal output of the first neural network model; training a second neural network model by taking the respective two-dimensional image as an input of the second neural network model, and output of each layer of the trained first neural network model as an ideal output of a corresponding layer of the second neural network model; inputting the target two-dimensional image into the trained second neural network model to acquire the three-dimensional feature information of the target two-dimensional image. 2. The method according to claim 1 , wherein both the first neural network model and the second neural network model are two-dimensional convolution-based neural network models. 3. The method according to claim 2 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: designing the first neural network model based on a residual network and a convolution with holes; before taking the respective two-dimensional image as the input of the second neural network model, the method further comprises: designing the second neural network model according to the first neural network model, and the second neural network model approximates to the first neural network model. 4. The method according to claim 3 , wherein the taking the respective three-dimensional voxel model as the input of the first neural network model specifically comprises: segmenting the three-dimensional voxel model in a depth direction of the three-dimensional voxel model to acquire two-dimensional voxel images in different depth directions, and taking the respective two-dimensional voxel image as the input of the first neural network model. 5. The method according to claim 1 , wherein the acquiring the two-dimensional image corresponding to the respective three-dimensional model in the three-dimensional model library and the three-dimensional voxel model corresponding to the respective three-dimensional model specifically comprises: acquiring the two-dimensional image corresponding to the respective three-dimensional model according to a perspective projection method; acquiring the three-dimensional voxel model corresponding to the respective three-dimensional model according to a three-dimensional perspective voxelization method; wherein the three-dimensional perspective voxelization method comprises: when a voxel corresponding to the three-dimensional model is inside the three-dimensional model or intersects with a surface of the three-dimensional model, the voxel is set as 1, otherwise the voxel is set as 0. 6. The method according to claim 1 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 7. The method according to claim 2 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 8. The method according to claim 3 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 9. The method according to claim 4 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 10. The method according to claim 5 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 11. The method according to claim 1 , wherein both the output of the each layer of the trained first neural network model and the output of the corresponding layer of the second neural network model satisfy a mean square error loss. 12. The method of claim 3 , wherein the first neural network model comprises n full pre-activation units; the second neural network model comprises a convolutional layer, a Batch Norm layer, an activation function layer, a maximum pooled layer and m full pre-activation units, wherein n is greater than m, both n and m are a positive integers greater than or equal to 1. 13. A device for image object component-level semantic segmentation, comprising a processor, configured to: acquire three-dimensional feature information of a target two-dimensional image; perform a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image; wherein the processor is further configured to: acquire a two-dimensional image corresponding to a respective three-dimensional model in a three-dimensional model library and a three-dimensional voxel model corresponding to the respective three-dimensional model in a three-dimensional model library; train a first neural network model by taking the respective three-dimensional voxel model as an input of the first neural network model, and a three-dimensional feature corresponding to the respective three-dimensional model as an ideal output of the first neural network model; train a second neural network model by taking the respective two-dimensional image as an input of the second neural network model, and output of each layer of the trained first neural network model as an ideal output of a corresponding layer of the second neural network model; input the target two-dimensional image into the trained second neural network model to acquire the three-dimensional feature information of the target two-dimensional image.
Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title
Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.