Method and apparatus for detecting target object in image

US11776155B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11776155-B2
Application numberUS-202016894123-A
CountryUS
Kind codeB2
Filing dateJun 5, 2020
Priority dateDec 10, 2019
Publication dateOct 3, 2023
Grant dateOct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a method and apparatus for detecting a target object in an image. The method includes: performing following prediction operations using a pre-trained neural network: detecting a target object in a two-dimensional image to determine a two-dimensional bounding box of the target object; and determining a relative position constraint relationship between the two-dimensional bounding box of the target object and a three-dimensional projection bounding box obtained by projecting a three-dimensional bounding box of the target object into the two-dimensional image; and the method further including: determining the three-dimensional projection bounding box of the target object, based on the two-dimensional bounding box of the target object and the relative position constraint relationship between the two-dimensional bounding box of the target object and the three-dimensional projection bounding box.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting a target object in an image, the method comprising: performing following prediction operations using a pre-trained neural network: detecting the target object in a two-dimensional image to determine a two-dimensional bounding box of the target object; and determining a relative position constraint relationship between the two-dimensional bounding box of the target object and a three-dimensional projection bounding box obtained by projecting a three-dimensional bounding box of the target object into the two-dimensional image; and the method further comprising: determining the three-dimensional projection bounding box of the target object, based on the two-dimensional bounding box of the target object and the relative position constraint relationship between the two-dimensional bounding box of the target object and the three-dimensional projection bounding box; wherein determining the relative position constraint relationship between the two-dimensional bounding box of the target object and the three-dimensional projection bounding box obtained by projecting the three-dimensional bounding box of the target object into the two-dimensional image, comprises: determining values of parameters in a preset parameter group corresponding to the target object; wherein, the preset parameter group comprises at least two first parameter pairs and at least four second parameters; wherein each of the first parameter pairs respectively represents a relative position of a vertex of the three-dimensional bounding box and the two-dimensional bounding box, and two parameters in the first parameter pair respectively represent: a relative position of a vertex on the three-dimensional bounding box and two vertices in a height direction of the two-dimensional bounding box, and a relative position of a vertex on the three-dimensional bounding box and two vertexes in a width direction of the two-dimensional bounding box; and wherein each of the second parameters respectively represents a relative position of a vertex of the three-dimensional projection bounding box in a width or height direction of the two-dimensional bounding box, and two vertices of the two-dimensional bounding box in a same direction, and any one of the first parameter pairs and any one of the second parameters represent positions of different vertices of the three-dimensional projection bounding box relative to the two-dimensional bounding box. 2. The method according to claim 1 , wherein determining the relative position constraint relationship between the two-dimensional bounding box of the target object and the three-dimensional projection bounding box obtained by projecting the three-dimensional bounding box of the target object into the two-dimensional image, further comprises: determining a posture type of the target object from at least two preset posture types, wherein the posture type of the target object is related to a number of vertices blocked by the target object among vertices of the three-dimensional projection bounding box of the target object; and determining the preset parameter group corresponding to the target object according to the posture type of the target object. 3. The method according to claim 2 , wherein the posture type of the target object is further related to an orientation of the target object, and wherein determining the three-dimensional projection bounding box of the target object, based on the two-dimensional bounding box of the target object and the relative position constraint relationship between the two-dimensional bounding box of the target object and the three-dimensional projection bounding box, comprises: determining coordinates of part of vertices of the three-dimensional projection bounding box based on coordinates of the vertices of the two-dimensional bounding box, the values of the parameters in the preset parameter group, and the posture type of the target object; and calculating coordinates of other vertices of the three-dimensional projection bounding box, based on the determined coordinates of the part of vertices of the three-dimensional projection bounding box, and a projection geometric relationship between the three-dimensional projection bounding box and the corresponding three-dimensional bounding box. 4. The method according to claim 1 , wherein the prediction operations further comprise: classifying the target object to determine a category of the target object. 5. The method according to claim 1 , wherein the pre-trained neural network is trained by: acquiring sample data, the sample data comprising a sample image of a three-dimensional projection bounding box labeling the target object included in the three-dimensional projection bounding box, the three-dimensional projection bounding box being a projection of a corresponding three-dimensional bounding box in the sample image; and performing multiple iteration training on the neural network for detecting the target object based on the sample data; the iteration training comprising: using the current neural network for detecting the target object to perform following operations: detecting the target object in the sample image to obtain a detection result of a two-dimensional bounding box of the target object; and determining a relative position constraint relationship between the two-dimensional bounding box and the three-dimensional projection bounding box of the target object in the sample image; determining a detection result of the three-dimensional projection bounding box of the target object in the sample image, based on the detection result of the two-dimensional bounding box of the target object in the sample image and the relative position constraint relationship between the two-dimensional bounding box and the three-dimensional projection bounding box of the target object in the sample image; and updating parameters of the neural network for detecting the target object, based on a difference between the detection result of the three-dimensional projection bounding box of the target object in the sample image and the three-dimensional projection bounding box of the target object in the sample image. 6. The method according to claim 5 , wherein the neural network for detecting the target object comprises a two-dimensional regression branch and a three-dimensional regression branch, wherein the two-dimensional regression branch outputs the detection result of the two-dimensional bounding box of the target object in the sample image, and the three-dimensional regression branch determines the relative position constraint relationship between the two-dimensional bounding box and the three-dimensional projection bounding box of the target object in the sample image. 7. The method according to claim 6 , wherein the neural network for detecting the target object further comprises a three-dimensional classification branch, and wherein the iteration training further comprises: determining a posture type of the target object using the three-dimensional classification branch, the posture type of the target object being related to a number of vertices blocked by the target object in vertices of the three-dimensional projection bounding box of the target object, and/or an orientation of the target object; and determining the relative position constraint relationship between the two-dimensional bounding box and the three-dimensional projection bounding box of the target object in the sample image according to the posture type of the target object by the three-dimensional regression branch. 8. The method according to claim 6 , wherein the sample data further comprises category labeling information of the target object in the sample image, a

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G06T7/73Primary

    using feature-based methods · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Classification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11776155B2 cover?
Embodiments of the present disclosure provide a method and apparatus for detecting a target object in an image. The method includes: performing following prediction operations using a pre-trained neural network: detecting a target object in a two-dimensional image to determine a two-dimensional bounding box of the target object; and determining a relative position constraint relationship betwee…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).