Method and system for automatic object annotation using deep network

US10936905B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10936905-B2
Application numberUS-201916504095-A
CountryUS
Kind codeB2
Filing dateJul 5, 2019
Priority dateJul 6, 2018
Publication dateMar 2, 2021
Grant dateMar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Object annotation is images is tedious time consuming task when large volume of data needs to annotated. Existing methods limit to semiautomatic approaches for annotation. The embodiments herein provide a method and system for a deep network based architecture for automatic object annotation. The deep network utilized is a two stage network with first stage as an annotation model comprising a Faster Region-based Fully Convolutional Networks (F-RCNN) and Region-based Fully Convolutional Networks (RFCN) providing for two class classification to generate annotated images from a set of single object test images. Further, the newly annotated test object images are then used to synthetically generate cluttered images and their corresponding annotations, which are used to train the second stage of the deep network comprising the multi-class object detection/classification model designed using the F-RCNN and the RFCN as base networks to automatically annotate input test image in real time.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor implemented method for automatic object annotation using deep network, the method comprising: receiving a manually annotated image set with each image comprising a single annotated object on a known background; generating a plurality of synthetic single object images by applying an affine transformation and a colour augmentation on each image from the manually annotated image set, wherein the generated plurality of synthetic single object images are annotated automatically in accordance with a corresponding manually annotated image; training an annotation model for two class object detection and classification using the synthetically generated single object images and manually annotated single object images to detect a foreground Region Of Interest (ROI) corresponding to the object in an image, wherein the annotation model comprises of a Faster Region-based Convolutional Neural Networks (F-RCNN) and Region-based Fully Convolutional Networks (RFCN); analyzing a set of single object test images comprising unknown objects placed on the known background using the trained annotation model to generate a set of annotated images; synthetically generating a plurality of clutter images with corresponding annotations using the set of annotated images; and utilizing the plurality of clutter images and corresponding annotations for training a multi-class object detection and classification model designed using the RCNN and the RFCN as base networks, wherein the multi-class object detection framework annotates input test image in real time by: identifying one or more ROIs corresponding to one or more objects in the input test image and class labels associated with the one or more objects, wherein the input test image is one of an single object input image or a clutter input image, wherein each ROI is defined by a bounding box with position coordinates comprising xmin, ymin, xmax, ymax. 2. The method of claim 1 , wherein training the annotation model comprises: a first training stage for creating a plurality of region proposals providing a plurality of possible foreground ROIs defined by a plurality of bounding boxes in a test image; and a second training stage for identifying the foreground ROI defined by the bounding box among the plurality of possible foreground ROIs. 3. The method of claim 1 , wherein generating the plurality of clutter images comprising a plurality objects from the manually annotated image set and the plurality of synthetic single object images comprises: for each clutter image to be generated: selecting a background image; dividing the background image into a plurality of grids; cropping the objects from manually annotated image set and the plurality of synthetic single object images using manually generated masks; randomly pasting the cropped objects on the plurality of grids; and assigning different binary values to the generated masks with for different objects in order to distinctly obtain foreground ROI in each clutter image generated. 4. The method of claim 1 , wherein the method further comprises using a multi-resolution multi-camera set up with each camera mounted on a rotating platform for capturing: a set of images for generating the manually annotated images; a set of test images of unknown objects; an input test image for the real time testing; and a background image for creating clutter image. 5. A system for automatic object annotation using deep network, comprising: a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more processors coupled to the memory via the one or more I/O interfaces, wherein the one or more processors are configured by the instructions to: receive a manually annotated image set with each image comprising a single annotated object on a known background; generate a plurality of synthetic single object images by applying affine transformation and colour augmentation on each image from the manually annotated image set, wherein the generated plurality of synthetic single object images are annotated automatically in accordance with a corresponding manually annotated image; train an annotation model for two class object detection and classification using the synthetically generated single object images and manually annotated single object images to detect a foreground Region of Interest (ROI) corresponding to the object in an image, wherein the annotation model comprises of a Faster Region-based Convolutional Neural Networks (F-RCNN) and Region-based Fully Convolutional Networks (RFCN); analyze a set of single object test images comprising unknown objects placed on the known background using the trained annotated model to generate a set of annotated images; synthetically generate a plurality of clutter images with corresponding annotations using the set of annotated images; and utilize the plurality of clutter images and corresponding annotations for training a multi-class object detection and classification model designed using the Region-based Fully Convolutional Networks (RCNN) and the Region-based Fully Convolutional Networks (RFCN) as base networks, wherein the multi-class object detection framework annotates input test image in real time by: identifying one or more ROIs corresponding to one or more objects in the input test image and class labels associated with the one or more objects, wherein the input test image is one of an single object input image or a clutter input image, wherein each ROI is defined by a bounding box with position coordinates comprising xmin, ymin, xmax, ymax. 6. The system of claim 5 , wherein the one or more processors are configured to train the annotation model based on a: a first training stage for creating a plurality of region proposals providing a plurality of possible foreground ROIs defined by a plurality of bounding boxes in a test image; and a second training stage for identifying the foreground ROI defined by the bounding box among the plurality of possible foreground ROIs. 7. The system of claim 5 , wherein the one or more processors are configured to generate the plurality of clutter images comprising a plurality objects from the manually annotated image set and the plurality of synthetic single object images by: for each clutter image to be generated: selecting a background image; dividing the background image into a plurality of grids; cropping the objects from manually annotated image set and the plurality of synthetic single object images using manually generated masks; randomly pasting the cropped objects on the plurality of grids; and assigning different binary values to the generated masks with for different objects in order to distinctly obtain foreground ROI in each clutter image generated. 8. The system of claim 5 , wherein the one or more processors are further configured to receive: a set of images for generating the manually annotated images; a set of test images of unknown objects; an input test image for the real time testing; and a background image for creating clutter image captured by a multi-resolution multi-camera set up with each camera mounted on a rotating platform. 9. A non-transitory computer readable medium, the non-transitory computer-readable medium stores instructions which, when executed by a hardware processor, cause the hardware processor to perform actions comprising: receiving a manually annotated image set with each image comprising a single annotated object on a known background; generating a plurality of synthetic single object images by applying an affine transformation and a colour augmentation on each image from the manually annotated image set, wherein the generated plurality of synthetic s

Assignees

Inventors

Classifications

  • G06V10/82Primary

    using neural networks · CPC title

  • Validation; Performance evaluation · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • G06F18/241Primary

    relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

  • Selection of the most significant subset of features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10936905B2 cover?
Object annotation is images is tedious time consuming task when large volume of data needs to annotated. Existing methods limit to semiautomatic approaches for annotation. The embodiments herein provide a method and system for a deep network based architecture for automatic object annotation. The deep network utilized is a two stage network with first stage as an annotation model comprising a F…
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).