Method and device for three-dimensional feature-embedded image object component-level semantic segmentation

US10740897B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10740897-B2
Application numberUS-201815871878-A
CountryUS
Kind codeB2
Filing dateJan 15, 2018
Priority dateSep 12, 2017
Publication dateAug 11, 2020
Grant dateAug 11, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present invention provide a method and a device for three-dimensional feature-embedded image object component-level semantic segmentation, the method includes: acquiring three-dimensional feature information of a target two-dimensional image; performing a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image. In the technical solution of the present application, not only the two-dimensional feature information of the image but also the three-dimensional feature information of the image are taken into consideration when performing the component-level semantic segmentation on the image, thereby improving the accuracy of the image component-level semantic segmentation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for three-dimensional feature-embedded image object component-level semantic segmentation, comprising: acquiring three-dimensional feature information of a target two-dimensional image; performing a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image; wherein the acquiring the three-dimensional feature information of the target two-dimensional image specifically comprises: acquiring a two-dimensional image corresponding to a respective three-dimensional model in a three-dimensional model library and a three-dimensional voxel model corresponding to the respective three-dimensional model in a three-dimensional model library; training a first neural network model by taking the respective three-dimensional voxel model as an input of the first neural network model, and a three-dimensional feature corresponding to the respective three-dimensional model as an ideal output of the first neural network model; training a second neural network model by taking the respective two-dimensional image as an input of the second neural network model, and output of each layer of the trained first neural network model as an ideal output of a corresponding layer of the second neural network model; inputting the target two-dimensional image into the trained second neural network model to acquire the three-dimensional feature information of the target two-dimensional image. 2. The method according to claim 1 , wherein both the first neural network model and the second neural network model are two-dimensional convolution-based neural network models. 3. The method according to claim 2 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: designing the first neural network model based on a residual network and a convolution with holes; before taking the respective two-dimensional image as the input of the second neural network model, the method further comprises: designing the second neural network model according to the first neural network model, and the second neural network model approximates to the first neural network model. 4. The method according to claim 3 , wherein the taking the respective three-dimensional voxel model as the input of the first neural network model specifically comprises: segmenting the three-dimensional voxel model in a depth direction of the three-dimensional voxel model to acquire two-dimensional voxel images in different depth directions, and taking the respective two-dimensional voxel image as the input of the first neural network model. 5. The method according to claim 1 , wherein the acquiring the two-dimensional image corresponding to the respective three-dimensional model in the three-dimensional model library and the three-dimensional voxel model corresponding to the respective three-dimensional model specifically comprises: acquiring the two-dimensional image corresponding to the respective three-dimensional model according to a perspective projection method; acquiring the three-dimensional voxel model corresponding to the respective three-dimensional model according to a three-dimensional perspective voxelization method; wherein the three-dimensional perspective voxelization method comprises: when a voxel corresponding to the three-dimensional model is inside the three-dimensional model or intersects with a surface of the three-dimensional model, the voxel is set as 1, otherwise the voxel is set as 0. 6. The method according to claim 1 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 7. The method according to claim 2 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 8. The method according to claim 3 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 9. The method according to claim 4 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 10. The method according to claim 5 , wherein before taking the respective three-dimensional voxel model as the input of the first neural network model, the method further comprises: compressing the respective three-dimensional voxel model, and outputting the compressed respective three-dimensional voxel model into the first neural network model. 11. The method according to claim 1 , wherein both the output of the each layer of the trained first neural network model and the output of the corresponding layer of the second neural network model satisfy a mean square error loss. 12. The method of claim 3 , wherein the first neural network model comprises n full pre-activation units; the second neural network model comprises a convolutional layer, a Batch Norm layer, an activation function layer, a maximum pooled layer and m full pre-activation units, wherein n is greater than m, both n and m are a positive integers greater than or equal to 1. 13. A device for image object component-level semantic segmentation, comprising a processor, configured to: acquire three-dimensional feature information of a target two-dimensional image; perform a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature information of the target two-dimensional image and two-dimensional feature information of the target two-dimensional image; wherein the processor is further configured to: acquire a two-dimensional image corresponding to a respective three-dimensional model in a three-dimensional model library and a three-dimensional voxel model corresponding to the respective three-dimensional model in a three-dimensional model library; train a first neural network model by taking the respective three-dimensional voxel model as an input of the first neural network model, and a three-dimensional feature corresponding to the respective three-dimensional model as an ideal output of the first neural network model; train a second neural network model by taking the respective two-dimensional image as an input of the second neural network model, and output of each layer of the trained first neural network model as an ideal output of a corresponding layer of the second neural network model; input the target two-dimensional image into the trained second neural network model to acquire the three-dimensional feature information of the target two-dimensional image.

Assignees

Inventors

Classifications

  • G06T7/10Primary

    Segmentation; Edge detection (motion-based segmentation G06T7/215) · CPC title

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10740897B2 cover?
Embodiments of the present invention provide a method and a device for three-dimensional feature-embedded image object component-level semantic segmentation, the method includes: acquiring three-dimensional feature information of a target two-dimensional image; performing a component-level semantic segmentation on the target two-dimensional image according to the three-dimensional feature infor…
Who is the assignee on this patent?
Univ Beihang
What technology area does this patent fall under?
Primary CPC classification G06T7/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 11 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).