Image semantic segmentation

US9865042B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9865042-B2
Application numberUS-201514801839-A
CountryUS
Kind codeB2
Filing dateJul 17, 2015
Priority dateJun 8, 2015
Publication dateJan 9, 2018
Grant dateJan 9, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: applying a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; masking the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determining a semantic category for each pixel in the image at least in part based on the segment features. 2. The method of claim 1 , wherein masking the feature maps comprises: generating low-resolution binary masks based on the binary masks and the feature maps; and applying the low-resolution binary masks onto the feature maps to generate the segment features. 3. The method of claim 2 , wherein generating the low-resolution binary masks comprises: projecting each of the activations on the feature maps to a center of the respective region on the image; associating each pixel in the binary masks with the nearest center; and assigning each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 4. The method of claim 3 , wherein generating the low-resolution binary masks further comprises: averaging values of pixels assigned to each of the activations; and generating the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 5. The method of claim 1 , wherein masking the feature maps comprises: directly masking the feature maps to generate the segment features, and wherein determining the semantic category for each pixel in the image comprises: pooling the segment features; and connecting the pooled segment features. 6. The method of claim 5 , wherein determining the semantic category for each pixel in the image further comprises: pooling regional features on the feature maps, each of the regional features being represented by a bounding box; connecting the pooled regional features; and determining the semantic category for each pixel in the image based on a concatenation of the connected segment features and the connected regional features. 7. The method of claim 5 , wherein at least one of the segment features and the regional features are pooled by spatial pyramid pooling (SPP). 8. The method of claim 1 , wherein masking the feature maps comprises: pooling the generated feature maps by spatial pyramid pooling (SPP) to obtain multiple levels of a pooled feature map; and masking the pooled feature map of a tiny level from the multiple levels to generate the segment features. 9. The method of claim 8 , wherein determining the semantic category for each pixel in the image comprises: connecting the segment features and the pooled feature map of other levels among from the multiple levels. 10. A computer program product being tangibly stored on a non-transient machine-readable medium and comprising machine-executable instructions, the instructions, when executed on a device, causing the device to: apply a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; mask the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determine a semantic category for each pixel in the image at least in part based on the segment features. 11. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: generate low-resolution binary masks based on the binary masks and the feature maps; and apply the low-resolution binary masks onto the feature maps to generate the segment features. 12. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: project each of the activations on the feature maps to a center of the respective region on the image; associate each pixel in the binary masks with the nearest center; and assign each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 13. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: average values of pixels assigned to each of the activations; and generate the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 14. A computing device, comprising: at least one memory and at least one processor, wherein the at least one memory and the at least one memory are respectively configured to store and execute instructions for causing the computing device to perform operations, the operations including: applying a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; masking the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determining a semantic category for each pixel in the image at least in part based on the segment features. 15. The computing device of claim 14 , wherein masking the feature maps comprises: generating low-resolution binary masks based on the binary masks and the feature maps; and applying the low-resolution binary masks onto the feature maps to generate the segment features. 16. The computing device of claim 15 , wherein generating the low-resolution binary masks comprises: projecting each of the activations on the feature maps to a center of the respective region on the image; associating each pixel in the binary masks with the nearest center; and assigning each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 17. The computing device of claim 16 , wherein generating the low-resolution binary masks further comprises: averaging values of pixels assigned to each of the activations; and generating the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 18. The computing device of claim 16 , wherein determining the semantic category for each pixel in the image further comprises: pooling regional features on the feature maps, each of the regional features being represented by a bounding box; connecting the pooled regional features; and determining the semantic category for each pixel in the image based on a concatenation of the connected segment features and the connected regional features. 19. The computing device of claim 16 , wherein at least one of the segment features and the regional features are pooled by spatial pyramid pooling (SPP). 20. The computing device of claim 14 , wherein masking the feature maps comprises: directly masking the feature maps to generate the segment features, and wherein determining the semantic category for each pixel in the image comprises: pooling the segment features; and connecting the pooled segment features.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9865042B2 cover?
In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary mask…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06T5/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 09 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).