Generating occlusion-aware bird eye view representations of complex road scenes

US2019094875A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2019094875-A1
Application numberUS-201816146202-A
CountryUS
Kind codeA1
Filing dateSep 28, 2018
Priority dateSep 28, 2017
Publication dateMar 28, 2019
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating an occlusion-aware bird's eye view map of a road scene include identifying foreground objects and background objects in an input image to extract foreground features and background features corresponding to the foreground objects and the background objects, respectively. The foreground objects are masked from the input image with a mask. Occluded objects and depths of the occluded objects are inferred by predicting semantic features and depths in masked areas of the masked image according to contextual information related to the background features visible in the masked image. The foreground objects and the background objects are mapped to a three-dimensional space according to locations of each of the foreground objects, the background objects and occluded objects using the inferred depths. A bird's eye view is generated from the three-dimensional space and displayed with a display device.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for autonomous navigation with an occlusion-aware bird's eye view map of a road scene, the method comprising: capturing an image of a road scene with background objects and foreground objects; identifying foreground objects and background objects in the image by using a semantic segmentation network to extract foreground features corresponding to the foreground objects and background features corresponding to the background objects; masking the foreground objects from the image with a mask to generate a masked image; inferring occluded objects by predicting semantic features in masked areas of the masked image with a semantic in-painting network according to contextual information related to the identified background features visible in the masked image; inferring depths of the occluded objects by predicting depths in masked areas of the masked image with a depth in-painting network according to the contextual information; mapping the foreground objects and the background objects to a three-dimensional space with a background mapping system according to locations of each of the foreground objects, the background objects and occluded objects using the inferred depths; generating a bird's eye view from the three-dimensional space; and adjusting a steering, a throttle and one or more brakes of a vehicle to navigate roads of the road scene while avoiding collisions. 2 . The method as recited by claim 1 , further including identifying the foreground objects and the background objects by joint feature extraction with an encoder to produce a joint feature map. 3 . The method as recited by claim 2 , further including: predicting depth probabilities for each pixel of the image by decoding the joint feature map with a depth decoder; and predicting class probabilities corresponding to the foreground objects and the background objects for each pixel of the image by decoding the joint feature map with a semantic decoder. 4 . The method as recited by claim 1 , further including inferring the occluded objects by: encoding the masked image with a masked image encoder to produce a masked image feature map: encoding the mask with a mask encoder to produce a mask feature map; fusing the masked image feature map with the mask feature map to produce a fused feature map; and decoding the fused feature map with a semantic decoder to predict class probabilities for each pixel of the masked areas corresponding to the occluded objects. 5 . The method as recited by claim 4 , further including inferring the depths by decoding the fused feature map with a depth decoder to predict depth probabilities for each pixel of the masked areas corresponding to the occluded objects. 6 . The method as recited in claim 1 , further including mapping the three-dimensional space to a two-dimensional space corresponding to the bird's eye view with a view converter, including: assigning three coordinate values corresponding to three coordinate axes to each point in the three-dimensional space, one of the coordinate axes including a z coordinate perpendicular to a ground plane of the road scene; and removing the z coordinate from the three coordinate values of each of the points to reduce the three-dimensional space to the two-dimensional space corresponding to a bird's eye view of the road scene. 7 . The method as recited in claim 1 , further including refining the bird's eye view with a refinement network, including: encoding the bird's eye view with an encoder to generate a bird's eye view feature map; and decoding the bird's eye view feature map with a decoder to generate a refined bird's eye view. 8 . The method as recited in claim 7 , further including training the refinement network, including: simulating background object shapes by modeling the background objects of the bird's eye view with a simulator; and determining an adversarial error between the background object shapes and shapes of the background objects corresponding to roads with an adversarial loss unit. 9 . The method as recited in claim 8 , further including modifying the adversarial error, including: determining a self-reconstruction error by comparing the refined bird's eye view to the bird's eye view with a self-loss unit; and combining the self-reconstruction error and the adversarial error. 10 . The method as recited in claim 7 , further including training the refinement network, including: warping a semantic aerial image of the road scene to align with the bird's eye view to produce a warped aerial image; and determining a reconstruction loss between the warped aerial image and the refined bird's eye view with a reconstruction loss unit. 11 . An autonomous vehicle for autonomous navigation with an occlusion-aware bird's eye view map of a road scene, the vehicle comprising: an image capture device that captures an image of a road scene with background objects and foreground objects; a semantic segmentation network that identifies foreground objects and background objects in the image by extracting foreground features corresponding to the foreground objects and background features corresponding to the background objects, the foreground object occluding the background objects from a view of a camera that captured the image; a masking network that masks the foreground objects from the image with a mask to generate a masked image; a semantic in-painting network that infers occluded objects by predicting semantic features in masked areas of the masked image according to contextual information related to the identified background features visible in the masked image; a depth in-painting network that infers depths of the occluded objects by predicting depths in masked areas of the masked image according to the contextual information; a background mapping system that maps the foreground objects and the background objects to a three-dimensional space according to locations of each of the foreground objects, the background objects and occluded object using the inferred depths; a view converter that generates a bird's eye view from the three-dimensional space; and a control unit that adjusts a steering, a throttle and one or more brakes of the vehicle to navigate roads of the road scene while avoiding collisions. 12 . The computer processing system as recited by claim 11 , further including an encoder to identify the foreground objects and the background objects by joint feature extraction to produce a joint feature map. 13 . The computer processing system as recited by claim 12 , further including: a depth decoder that predicts depth probabilities for each pixel of the input image by decoding the joint feature map; and a semantic decoder that predicts predict class probabilities corresponding to the foreground objects and the background objects for each pixel of the image by decoding the joint feature map. 14 . The computer processing system as recited by claim 11 , further including an in-painting network that infers the occluded object, including: a masking image encoder that encodes the masked image to produce a masked image feature map: a mask encoder that encodes the mask to produce a mask feature map; a fuser that fuses the masked image feature map with the mask feature map to produce a fused feature map; and a semantic decoder that decodes the fused feature map to predict class probabilities for each pixel of the masked areas corresponding to the occluded objects. 15 . The computer processing system as recited by claim 14 , further including a depth decoder that infers the d

Assignees

Inventors

Classifications

  • G06T15/20Primary

    Perspective computation · CPC title

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • from specularities · CPC title

  • Vehicle exterior; Vicinity of vehicle · CPC title

  • Region-based segmentation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019094875A1 cover?
Systems and methods for generating an occlusion-aware bird's eye view map of a road scene include identifying foreground objects and background objects in an input image to extract foreground features and background features corresponding to the foreground objects and the background objects, respectively. The foreground objects are masked from the input image with a mask. Occluded objects and d…
Who is the assignee on this patent?
Nec Lab America Inc
What technology area does this patent fall under?
Primary CPC classification G06T15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 28 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).