AR-based labeling tool for 3D object detection model training

US11869257B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11869257-B2
Application numberUS-202117206255-A
CountryUS
Kind codeB2
Filing dateMar 19, 2021
Priority dateMar 19, 2021
Publication dateJan 9, 2024
Grant dateJan 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for detecting and labeling a target object in a 2D image includes receiving a plurality of 2D images from a visual sensor, manually marking points of the target object on each of the 2D images, generating from the 2D images a 3D world coordinate system of the environment surrounding the target object, mapping each of the marked points on the 2D images to the 3D world coordinate system using a simultaneous localization and mapping (SLAM) engine, automatically generating a 3D bounding box covering all the marked points mapped to the 3D world coordinate system, mapping the 3D bounding box to each of the 2D images, generating a label for the target object on each of the 2D images using a machine learning object detection model, and training the machine learning object detection model based on the generated label for the target object.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method for detecting and labeling an object in a 2D image comprising: receiving a plurality of 2D images from a visual sensor, each image of the plurality of 2D images includes an image of a target object in an surrounding environment; detecting features and extracting feature points from the plurality of 2D images; manually marking visible feature points of the target object on each image of the plurality of 2D images; estimating occluded feature points of the target object in at least one of the plurality of 2D images by defining axis lines starting from visible marked points; generating from the plurality of 2D images a 3D world coordinate system of the environment surrounding the target object; mapping each of the marked feature points on the plurality of 2D images to the 3D world coordinate system; generating a 3D map of the surrounding environment using the extracted feature points; automatically generating a 3D bounding box for the target object covering all the marked points mapped to the 3D world coordinate system; determining a ground plane of the 3D world coordinate system; automatically fitting, in the 3D world coordinate system, the 3D bounding box for the target object in each of the plurality of 2D images based on the visible and estimated occluded points, the axis lines and the ground plane; mapping the 3D bounding box back to the plurality of 2D images; and generating a label for the target object surrounded by the 3D bounding box on each of the plurality of 2D images using a machine learning object detection model and projecting the label to corresponding feature points of the target object in the 3D map of the surrounding environment. 2. The computer implemented method of claim 1 , further comprising fitting the 3D bounding box for each of the plurality of 2D images on the ground plane in the 3D world coordinate system. 3. The computer implemented method of claim 1 , further comprising marking two points in each of the plurality of 2D images that define a main axis of the target object and using the main axis when generating the 3D bounding box. 4. The computer implemented method of claim 1 , wherein defining axis lines starting from visible marked points includes defining a first line between two visible marked points, defining a second line between a visible point and an occluded point. 5. The computer implemented method of claim 4 , further including forcing the second line to be parallel to the first line. 6. The computer implemented method of claim 1 , further comprising training the machine learning object detection model based on the generated label for the target object. 7. The computer implemented method of claim 6 , wherein training the machine learning object detection model includes determining correspondence between 2D feature points in the plurality of 2D images and 3D feature points in the 3D map, determining that a set of 2D feature points in the plurality of 2D images belongs to the target object, labeling the set of 2D feature points with the corresponding object and projecting the object label of the 2D feature points to the corresponding set of 3D feature points. 8. A computer system for detecting and labeling an object in a 2D image, comprising: one or more computer processors; one or more non-transitory computer-readable storage media; program instructions, stored on the one or more non-transitory computer-readable storage media, which when implemented by the one or more processors, cause the computer system to perform the steps of: receiving a plurality of 2D images from a visual sensor, each image of the plurality of 2D images includes an image of a target object in an surrounding environment; detecting features and extracting feature points from the plurality of 2D images; manually marking visible feature points of the target object on each image of the plurality of 2D images; estimating occluded feature points of the target object in at least one of the plurality of 2D images by defining axis lines starting from visible marked points; generating from the plurality of 2D images a 3D world coordinate system of the environment surrounding the target object; mapping each of the marked feature points on the plurality of 2D images to the 3D world coordinate system; generating a 3D map of the surrounding environment using the extracted feature points; automatically generating a 3D bounding box for the target object covering all the marked points mapped to the 3D world coordinate system; determining a ground plane of the 3D world coordinate system; automatically fitting, in the 3D world coordinate system, the 3D bounding box for the target object in each of the plurality of 2D images based on the visible and estimated occluded points, the axis lines and the ground plane; mapping the 3D bounding box back to the plurality of 2D images; and generating a label for the target object surrounded by the 3D bounding box on each of the plurality of 2D images using a machine learning object detection model and projecting the label to corresponding feature points of the target object in the 3D map of the surrounding environment. 9. The computer system of claim 8 , further comprising fitting the 3D bounding box for each of the plurality of 2D images on the ground plane in the 3D world coordinate system. 10. The computer system of claim 8 , further comprising marking two points in each of the plurality of 2D images that define a main axis of the target object and using the main axis when generating the 3D bounding box. 11. The computer system of claim 8 , wherein defining axis lines starting from visible marked points includes defining a first line between two visible marked points, defining a second line between a visible point and an occluded point. 12. The computer system of claim 11 , further including forcing the second line to be parallel to the first line. 13. The computer system of claim 8 , further comprising training the machine learning object detection model based on the generated label for the target object. 14. The computer system of claim 13 , wherein training the machine learning object detection model includes determining correspondence between 2D feature points in the plurality of 2D images and 3D feature points in the 3D map, determining that a set of 2D feature points in the plurality of 2D images belongs to the target object, labeling the set of 2D feature points with the corresponding object and projecting the object label of the 2D feature points to the corresponding set of 3D feature points. 15. A computer program product comprising: program instructions on a computer-readable storage medium, where execution of the program instructions using a computer causes the computer to perform a method for detecting and labeling an object in a 2D image, comprising: receiving a plurality of 2D images from a visual sensor, each image of the plurality of 2D images includes an image of a target object in an surrounding environment; detecting features and extracting feature points from the plurality of 2D images; manually marking visible feature points of the target object on each image of the plurality of 2D images; estimating occluded feature points of the target object in at least one of the plurality of 2D images by defining axis lines starting from visible marked points; generating from the plurality of 2D images a 3D world coordinate system of the environment surrounding the target object; mapping each of the marked feature points on the plurality of 2D images to the 3D world coordinate system; au

Assignees

Inventors

Classifications

  • G06V20/80Primary

    Recognising image objects characterised by unique random patterns · CPC title

  • Edge-based segmentation · CPC title

  • Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title

  • using neural networks · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11869257B2 cover?
A method for detecting and labeling a target object in a 2D image includes receiving a plurality of 2D images from a visual sensor, manually marking points of the target object on each of the 2D images, generating from the 2D images a 3D world coordinate system of the environment surrounding the target object, mapping each of the marked points on the 2D images to the 3D world coordinate system …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06V20/80. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).