Method for Semantically Labeling an Image of a Scene using Recursive Context Propagation
US-2016055237-A1 · Feb 25, 2016 · US
US9865042B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9865042-B2 |
| Application number | US-201514801839-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 17, 2015 |
| Priority date | Jun 8, 2015 |
| Publication date | Jan 9, 2018 |
| Grant date | Jan 9, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In implementations of the subject matter described herein, the feature maps are obtained by convoluting an input image using a plurality of layers of convolution filters. The feature maps record semantic information for respective regions on the image and only need to be computed once. Segment features of the image are extracted from the convolutional feature maps. Particularly, the binary masks may be obtained from a set of candidate segments of the image. The binary masks are used to mask the feature maps instead of the raw image. The masked feature maps define the segment features. The semantic segmentation of the image is done by determining a semantic category for each pixel in the image at least in part based on the resulting segment features.
Opening claim text (preview).
What is claimed is: 1. A method comprising: applying a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; masking the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determining a semantic category for each pixel in the image at least in part based on the segment features. 2. The method of claim 1 , wherein masking the feature maps comprises: generating low-resolution binary masks based on the binary masks and the feature maps; and applying the low-resolution binary masks onto the feature maps to generate the segment features. 3. The method of claim 2 , wherein generating the low-resolution binary masks comprises: projecting each of the activations on the feature maps to a center of the respective region on the image; associating each pixel in the binary masks with the nearest center; and assigning each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 4. The method of claim 3 , wherein generating the low-resolution binary masks further comprises: averaging values of pixels assigned to each of the activations; and generating the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 5. The method of claim 1 , wherein masking the feature maps comprises: directly masking the feature maps to generate the segment features, and wherein determining the semantic category for each pixel in the image comprises: pooling the segment features; and connecting the pooled segment features. 6. The method of claim 5 , wherein determining the semantic category for each pixel in the image further comprises: pooling regional features on the feature maps, each of the regional features being represented by a bounding box; connecting the pooled regional features; and determining the semantic category for each pixel in the image based on a concatenation of the connected segment features and the connected regional features. 7. The method of claim 5 , wherein at least one of the segment features and the regional features are pooled by spatial pyramid pooling (SPP). 8. The method of claim 1 , wherein masking the feature maps comprises: pooling the generated feature maps by spatial pyramid pooling (SPP) to obtain multiple levels of a pooled feature map; and masking the pooled feature map of a tiny level from the multiple levels to generate the segment features. 9. The method of claim 8 , wherein determining the semantic category for each pixel in the image comprises: connecting the segment features and the pooled feature map of other levels among from the multiple levels. 10. A computer program product being tangibly stored on a non-transient machine-readable medium and comprising machine-executable instructions, the instructions, when executed on a device, causing the device to: apply a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; mask the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determine a semantic category for each pixel in the image at least in part based on the segment features. 11. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: generate low-resolution binary masks based on the binary masks and the feature maps; and apply the low-resolution binary masks onto the feature maps to generate the segment features. 12. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: project each of the activations on the feature maps to a center of the respective region on the image; associate each pixel in the binary masks with the nearest center; and assign each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 13. The computer program product of claim 10 , wherein the instructions, when executed on the device, cause the device to: average values of pixels assigned to each of the activations; and generate the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 14. A computing device, comprising: at least one memory and at least one processor, wherein the at least one memory and the at least one memory are respectively configured to store and execute instructions for causing the computing device to perform operations, the operations including: applying a sequence of convolution filtering on an image to obtain feature maps, the feature maps including a plurality of activations, each of the activations representing semantic information for a region on the image; masking the feature maps with binary masks to generate segment features of the image, each of the binary masks representing a candidate segment of the image; and determining a semantic category for each pixel in the image at least in part based on the segment features. 15. The computing device of claim 14 , wherein masking the feature maps comprises: generating low-resolution binary masks based on the binary masks and the feature maps; and applying the low-resolution binary masks onto the feature maps to generate the segment features. 16. The computing device of claim 15 , wherein generating the low-resolution binary masks comprises: projecting each of the activations on the feature maps to a center of the respective region on the image; associating each pixel in the binary masks with the nearest center; and assigning each pixel in the binary masks to one of the activations on the feature maps based on the associated center. 17. The computing device of claim 16 , wherein generating the low-resolution binary masks further comprises: averaging values of pixels assigned to each of the activations; and generating the low-resolution binary masks by comparing the averaged values and a predetermined threshold. 18. The computing device of claim 16 , wherein determining the semantic category for each pixel in the image further comprises: pooling regional features on the feature maps, each of the regional features being represented by a bounding box; connecting the pooled regional features; and determining the semantic category for each pixel in the image based on a concatenation of the connected segment features and the connected regional features. 19. The computing device of claim 16 , wherein at least one of the segment features and the regional features are pooled by spatial pyramid pooling (SPP). 20. The computing device of claim 14 , wherein masking the feature maps comprises: directly masking the feature maps to generate the segment features, and wherein determining the semantic category for each pixel in the image comprises: pooling the segment features; and connecting the pooled segment features.
Related publications grouped by family.
Answers are generated from the same data shown on this page.