Training constrained deconvolutional networks for road scene semantic segmentation

US9916522B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9916522-B2
Application numberUS-201615090984-A
CountryUS
Kind codeB2
Filing dateApr 5, 2016
Priority dateMar 11, 2016
Publication dateMar 13, 2018
Grant dateMar 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A source deconvolutional network is adaptively trained to perform semantic segmentation. Image data is then input to the source deconvolutional network and outputs of the S-Net are measured. The same image data and the measured outputs of the source deconvolutional network are then used to train a target deconvolutional network. The target deconvolutional network is defined by a substantially fewer numerical parameters than the source deconvolutional network.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for forming a computer system for producing label data for corresponding areas of an input image, the label data being one of a set of predetermined values and being indicative that the corresponding area of the image is an image of an object which is in a respective one of a set of object categories, the method comprising: adaptively generating a source deconvolutional network defined by a plurality of first values, by supervised learning using training data comprising (i) first image data encoding training images and (ii) for each training image a corresponding set of annotation data, the set of annotation data for each training image indicating, for a plurality of corresponding areas of the training image, that the area of the training image is an image of an object which is in a respective one of the set of object categories; inputting second image data encoding training images to the source deconvolutional network and collecting corresponding output data describing one or more outputs of the source deconvolutional network; using the second image data and the output data to generate adaptively a target deconvolutional network defined by a plurality of second values, the cardinality of the second values being lower than that of the first values; and forming a computer system which implements the target deconvolutional network. 2. A method according to claim 1 in which the collected output data for a given training image is a set of labels indicating, for respective regions of the training image, that the region of the training image shows an object which is a corresponding predefined object category. 3. A method according to claim 1 in which the collected output data for a given training image is a set of vectors, each vector having a number of components equal to the number of object categories, and indicating, for respective region of the training image, a probability value that the region shows an object is in the corresponding object category. 4. A method according to claim 3 in which the adaptive generation of the target deconvolutional network is performed using a cross-entropy loss function which, for a given area of one of the training images encoded by the second image data, is indicative of the cross-entropy between the corresponding outputs of the target deconvolutional network when presented with the training image, and the corresponding outputs of the source deconvolutional network. 5. A method according to claim 4 in which the cross-entropy loss function is calculated by calculating a sum over the object categories of the product of: (a) a term representative of the similarity of a corresponding output of the target deconvolutional network and a source deconvolutional network, and (b) a weighting term for the object category, wherein the weighting term for the object category decreases for increasing frequency of objects of the corresponding object category in the training data. 6. A method according to claim 1 in which, in the adaptive generation of at least one of the source deconvolutional network and the target deconvolutional network, is by a backpropagation algorithm, and wherein, during the algorithm, successive subsets of the values are randomly selected, and an effect on the output of corresponding network of each selected subset of values is successively neglected. 7. A method according to claim 1 in which at least some of the training images of the second image data are training images of the first image data. 8. A method according to claim 1 in which the training data includes a first portion for which the annotation data has a relatively high density, and a second portion for which the annotation data has a relatively low density. 9. A method according to claim 8 in which the step of generating the source deconvolutional network comprises generating a first network component using the first portion of the training data, and generating a second network component using the second portion of the training data, the source deconvolutional network being adapted (i) to transmit image data which is input to the source deconvolutional network, to each of the first and second network components, and (ii) to generate the one or more outputs using the outputs of the first and second network components. 10. A method according to claim 4 in which at least one of the source deconvolutional network and the target deconvolutional network is generated using successive batches of the training data, each batch of training data comprising a plurality of relatively densely-sampled images and a plurality of relatively sparsely-sampled images, and said generation uses a cost function having a first component derived from the relatively densely-sampled images and a second component derived from the relatively sparsely-sampled images. 11. A method according to claim 10 in which the relative importance of the two cost components of the cost function is determined by a weighting parameter. 12. A method according to claim 1 in which the generation of at least one of the source deconvolutional network and the target deconvolutional network uses a cost function which, for each of a plurality of predefined classes of objects, varies inversely with a measure of the frequency of occurrence in the images of objects in each class. 13. A method according to claim 1 in which the computer system comprises an integrated circuit, the method comprising forming the integrated circuit to implement a computational model. 14. A method according to claim 13 in which the integrated circuit is a programmable integrated circuit, the step of forming the integrated circuit to implement the computational model comprising programming the integrated circuit using the second values. 15. A method according to claim 1 in which the image data encodes images of respective road transportation scenes. 16. A computer apparatus for designing a computer system for producing label data for corresponding areas of an input image, the label data being one of a set of predetermined values and being indicative that the corresponding area of the image is an image of an object which is in a respective one of a set of object categories, the computer apparatus comprising a processor and a data storage device storing computer program instructions operative, when followed by the processor, to cause the processor: to generate a source deconvolutional network defined by a plurality of first values, by supervised learning using training data comprising (i) first image data encoding training images and (ii) for each training image a corresponding set of annotation data, the set of annotation data for each training image indicating, for a plurality of corresponding areas of the training image, that the area of the training image is an image of an object which is in a respective one of the set of object categories; to calculate output data describing one or more outputs of the source deconvolutional network upon inputting to the source deconvolutional network second image data encoding training images; to use the second image data and the output data to generate adaptively a target deconvolutional network defined by a plurality of second values, the cardinality of the second values being lower than that of the first values. 17. A computer apparatus according to claim 16 in which the program instructions are operative to cause the processor to collect the output data for a given training image as a set of labels indicating, for respective regions of the training i

Assignees

Inventors

Classifications

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9916522B2 cover?
A source deconvolutional network is adaptively trained to perform semantic segmentation. Image data is then input to the source deconvolutional network and outputs of the S-Net are measured. The same image data and the measured outputs of the source deconvolutional network are then used to train a target deconvolutional network. The target deconvolutional network is defined by a substantially f…
Who is the assignee on this patent?
Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).