Spatial and temporal information for semantic segmentation
US-10176388-B1 · Jan 8, 2019 · US
US2019094875A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019094875-A1 |
| Application number | US-201816146202-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 28, 2018 |
| Priority date | Sep 28, 2017 |
| Publication date | Mar 28, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for generating an occlusion-aware bird's eye view map of a road scene include identifying foreground objects and background objects in an input image to extract foreground features and background features corresponding to the foreground objects and the background objects, respectively. The foreground objects are masked from the input image with a mask. Occluded objects and depths of the occluded objects are inferred by predicting semantic features and depths in masked areas of the masked image according to contextual information related to the background features visible in the masked image. The foreground objects and the background objects are mapped to a three-dimensional space according to locations of each of the foreground objects, the background objects and occluded objects using the inferred depths. A bird's eye view is generated from the three-dimensional space and displayed with a display device.
Opening claim text (preview).
What is claimed is: 1 . A method for autonomous navigation with an occlusion-aware bird's eye view map of a road scene, the method comprising: capturing an image of a road scene with background objects and foreground objects; identifying foreground objects and background objects in the image by using a semantic segmentation network to extract foreground features corresponding to the foreground objects and background features corresponding to the background objects; masking the foreground objects from the image with a mask to generate a masked image; inferring occluded objects by predicting semantic features in masked areas of the masked image with a semantic in-painting network according to contextual information related to the identified background features visible in the masked image; inferring depths of the occluded objects by predicting depths in masked areas of the masked image with a depth in-painting network according to the contextual information; mapping the foreground objects and the background objects to a three-dimensional space with a background mapping system according to locations of each of the foreground objects, the background objects and occluded objects using the inferred depths; generating a bird's eye view from the three-dimensional space; and adjusting a steering, a throttle and one or more brakes of a vehicle to navigate roads of the road scene while avoiding collisions. 2 . The method as recited by claim 1 , further including identifying the foreground objects and the background objects by joint feature extraction with an encoder to produce a joint feature map. 3 . The method as recited by claim 2 , further including: predicting depth probabilities for each pixel of the image by decoding the joint feature map with a depth decoder; and predicting class probabilities corresponding to the foreground objects and the background objects for each pixel of the image by decoding the joint feature map with a semantic decoder. 4 . The method as recited by claim 1 , further including inferring the occluded objects by: encoding the masked image with a masked image encoder to produce a masked image feature map: encoding the mask with a mask encoder to produce a mask feature map; fusing the masked image feature map with the mask feature map to produce a fused feature map; and decoding the fused feature map with a semantic decoder to predict class probabilities for each pixel of the masked areas corresponding to the occluded objects. 5 . The method as recited by claim 4 , further including inferring the depths by decoding the fused feature map with a depth decoder to predict depth probabilities for each pixel of the masked areas corresponding to the occluded objects. 6 . The method as recited in claim 1 , further including mapping the three-dimensional space to a two-dimensional space corresponding to the bird's eye view with a view converter, including: assigning three coordinate values corresponding to three coordinate axes to each point in the three-dimensional space, one of the coordinate axes including a z coordinate perpendicular to a ground plane of the road scene; and removing the z coordinate from the three coordinate values of each of the points to reduce the three-dimensional space to the two-dimensional space corresponding to a bird's eye view of the road scene. 7 . The method as recited in claim 1 , further including refining the bird's eye view with a refinement network, including: encoding the bird's eye view with an encoder to generate a bird's eye view feature map; and decoding the bird's eye view feature map with a decoder to generate a refined bird's eye view. 8 . The method as recited in claim 7 , further including training the refinement network, including: simulating background object shapes by modeling the background objects of the bird's eye view with a simulator; and determining an adversarial error between the background object shapes and shapes of the background objects corresponding to roads with an adversarial loss unit. 9 . The method as recited in claim 8 , further including modifying the adversarial error, including: determining a self-reconstruction error by comparing the refined bird's eye view to the bird's eye view with a self-loss unit; and combining the self-reconstruction error and the adversarial error. 10 . The method as recited in claim 7 , further including training the refinement network, including: warping a semantic aerial image of the road scene to align with the bird's eye view to produce a warped aerial image; and determining a reconstruction loss between the warped aerial image and the refined bird's eye view with a reconstruction loss unit. 11 . An autonomous vehicle for autonomous navigation with an occlusion-aware bird's eye view map of a road scene, the vehicle comprising: an image capture device that captures an image of a road scene with background objects and foreground objects; a semantic segmentation network that identifies foreground objects and background objects in the image by extracting foreground features corresponding to the foreground objects and background features corresponding to the background objects, the foreground object occluding the background objects from a view of a camera that captured the image; a masking network that masks the foreground objects from the image with a mask to generate a masked image; a semantic in-painting network that infers occluded objects by predicting semantic features in masked areas of the masked image according to contextual information related to the identified background features visible in the masked image; a depth in-painting network that infers depths of the occluded objects by predicting depths in masked areas of the masked image according to the contextual information; a background mapping system that maps the foreground objects and the background objects to a three-dimensional space according to locations of each of the foreground objects, the background objects and occluded object using the inferred depths; a view converter that generates a bird's eye view from the three-dimensional space; and a control unit that adjusts a steering, a throttle and one or more brakes of the vehicle to navigate roads of the road scene while avoiding collisions. 12 . The computer processing system as recited by claim 11 , further including an encoder to identify the foreground objects and the background objects by joint feature extraction to produce a joint feature map. 13 . The computer processing system as recited by claim 12 , further including: a depth decoder that predicts depth probabilities for each pixel of the input image by decoding the joint feature map; and a semantic decoder that predicts predict class probabilities corresponding to the foreground objects and the background objects for each pixel of the image by decoding the joint feature map. 14 . The computer processing system as recited by claim 11 , further including an in-painting network that infers the occluded object, including: a masking image encoder that encodes the masked image to produce a masked image feature map: a mask encoder that encodes the mask to produce a mask feature map; a fuser that fuses the masked image feature map with the mask feature map to produce a fused feature map; and a semantic decoder that decodes the fused feature map to predict class probabilities for each pixel of the masked areas corresponding to the occluded objects. 15 . The computer processing system as recited by claim 14 , further including a depth decoder that infers the d
Perspective computation · CPC title
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
from specularities · CPC title
Vehicle exterior; Vicinity of vehicle · CPC title
Region-based segmentation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.