Apparatuses and methods for semantic image labeling

US10699170B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10699170-B2
Application numberUS-201815864142-A
CountryUS
Kind codeB2
Filing dateJan 8, 2018
Priority dateJul 8, 2015
Publication dateJun 30, 2020
Grant dateJun 30, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a method for generating a semantic image labeling model, comprising: forming a first CNN and a second CNN, respectively; randomly initializing the first CNN; inputting a raw image and predetermined label ground truth annotations to the first CNN to iteratively update weights thereof so that a category label probability for the image, which is output from the first CNN, approaches the predetermined label ground truth annotations; randomly initializing the second CNN; inputting the category label probability to the second CNN to correct the input category label probability so as to determine classification errors of the category label probabilities; updating the second CNN by back-propagating the classification errors; concatenating the updated first and second CNNs; classifying each pixel in the raw image into one of general object categories; and back-propagating classification errors through the concatenated CNN to update weights thereof until the classification errors less than a predetermined threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for generating a semantic image labeling model, comprising: forming a first Convolutional Neural Network (CNN) and a second CNN, respectively, the first CNN being different from the second CNN; training the first CNN by: randomly initializing the first CNN; and inputting a raw image and a plurality of predetermined label ground truth annotations to the first CNN to iteratively update weights of the first CNN so that a category label probability for the raw image, which is output from the first CNN, approaches the predetermined label ground truth annotations; wherein the first CNN is trained by utilizing one item of: per-pixel category label maps, category bounding boxes, image-level tags, or image-level descriptive sentences as a supervision; training the second CNN by: randomly initializing the second CNN; inputting the category label probability to the second CNN to correct the input category label probability so as to determine classification errors of the category label probabilities; and updating the second CNN by back-propagating the classification errors; concatenating the updated first CNN and the updated second CNN; classifying each pixel in the raw image into one of a plurality of general object categories; and back-propagating classification errors through the concatenated CNN to update weights of the concatenated CNN until the classification errors less than a predetermined threshold, wherein the second CNN is configured to determine contextual information for each pixel in spatial domain (triple penalty) from the category label probability. 2. The method of claim 1 , wherein the second CNN is configured to compute a similarity relationship of a current reference pixel in the image with its neighboring pixels, wherein, the computed similarity relationship changes for a different reference pixel, the second CNN utilizes a plurality of locally-shared filters to update the similarity relationships, such that similar pixels have similar category labels. 3. The method of claim 2 , wherein the second CNN utilizes a plurality of globally-shared filters to update local label contexts of the pixels; wherein each globally-shared filter produces a matching cost of the label contexts, and the globally-shared filter with minimum matching cost represents one type of local label context. 4. A method for semantic image labeling, comprising: determining, by a first pre-trained Convolutional Neural Network (CNN), category label probabilities for each pixel in an image; determining, by a second pre-trained CNN, contextual information for each pixel in spatial domain from the category label probabilities based on a similarity relationship of a current reference pixel with its neighboring pixels; determining local label contexts for each pixel from the category label probabilities, the local label contexts being shared across different positions of the image; multiplying the determined contextual information by the determined local label contexts to obtain adjustments to the determined category label probabilities; and applying the adjustments to the category label probabilities to update the category label probabilities. 5. The method of claim 4 , wherein the first CNN comprises at least one convolutional layer and at least one pooling layer, wherein, the first CNN is trained by: initializing weights of each of the layers randomly; classifying each pixel in the image into one of a plurality of general object categories to calculate a classification error; and back-propagating iteratively the classification error through the first CNN to update the weights, until a newly calculated classification error is less than a predetermined threshold. 6. The method of claim 4 , wherein the second pre-trained CNN has a different architecture from the first CNN. 7. The method of claim 6 , wherein the method further comprises training the second CNN by: receiving an image and ground truth category labels; using a first pre-trained CNN to compare each pixel in the received image with the ground truth category labels so as to predict a category label for each pixel in the received image to obtain category label probabilities that the certain label was assigned to this pixel; and feeding the ground truth category labels and the obtained category label probabilities into the second pre-trained CNN to update the second CNN. 8. The method of claim 7 , further comprising: concatenating the updated first CNN and the updated second CNN; classifying each pixel in the raw image into one of a plurality of general object categories to obtain a classification error; and back-propagating classification errors through the concatenated CNN to update weights of the concatenated CNN until the classification error is less than a predetermined threshold. 9. The method of claim 4 , wherein the determining contextual information for each pixel in spatial domain from the category label probabilities further comprise: computing a similarity relationship of a current reference pixel with its neighboring pixels, wherein, the computed similarity relationship changes for a different reference pixel, the second CNN utilizes a plurality of locally-shared filters to update the similarity relationships, such that similar pixels should have similar category labels. 10. The method of claim 9 , wherein in the determining local label contexts for each pixel from the category label probabilities, the second CNN utilizes a plurality of globally-shared filters to update the local label contexts; wherein each globally-shared filter produces a matching cost of the label contexts, and the globally-shared filter with minimum matching cost is just the label contexts. 11. An apparatus for semantic image labeling, comprising: a processor; and a memory storing instructions, the instructions when executed by the processor, cause the processor to perform operations, the operations comprising: determining, by a first pre-trained Convolutional Neural Network (CNN), category label probabilities for each pixel in an image; determining, by a second pre-trained CNN, contextual information for each pixel in spatial domain from the category label probabilities based on a similarity relationship of a current reference pixel with its neighboring pixels; and determining local label contexts for each pixel from the category label probabilities, the local label contexts being shared across different positions of the image; and multiplying the determined contextual information by the determined local label contexts to obtain adjustments to the determined category label probabilities; and applying the adjustments to the category label probabilities to update the category label probabilities. 12. The apparatus of claim 11 , wherein the determining contextual information for each pixel in spatial domain from the category label probabilities comprises computing a similarity relationship of a current reference pixel with its neighboring pixels, wherein, the computed similarity relationship changes for a different reference pixel, the second CNN utilizes a plurality of locally-shared filters to model the similarity relationships, such that similar pixels have similar category labels. 13. The apparatus of claim 12 , wherein the second CNN utilizes a plurality of globally-shared filters to model the local label contexts; wherein each globally-shared filter produces a matching cost of the label contexts, and the globally-shared filter with minimum matching cost represents one type of local label context. 14. The apparatus of claim 11 , wher

Assignees

Inventors

Classifications

  • Validation; Performance evaluation · CPC title

  • Syntactic or semantic context, e.g. balancing · CPC title

  • Validation; Performance evaluation; Active pattern learning techniques · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10699170B2 cover?
Disclosed is a method for generating a semantic image labeling model, comprising: forming a first CNN and a second CNN, respectively; randomly initializing the first CNN; inputting a raw image and predetermined label ground truth annotations to the first CNN to iteratively update weights thereof so that a category label probability for the image, which is output from the first CNN, approaches t…
Who is the assignee on this patent?
Beijing Sensetime Tech Development Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V30/1916. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 30 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).