Distance to obstacle detection in autonomous machine applications

US11308338B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11308338-B2
Application numberUS-201916728595-A
CountryUS
Kind codeB2
Filing dateDec 27, 2019
Priority dateDec 28, 2018
Publication dateApr 19, 2022
Grant dateApr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, a deep neural network (DNN) is trained to accurately predict, in deployment, distances to objects and obstacles using image data alone. The DNN may be trained with ground truth data that is generated and encoded using sensor data from any number of depth predicting sensors, such as, without limitation, RADAR sensors, LIDAR sensors, and/or SONAR sensors. Camera adaptation algorithms may be used in various embodiments to adapt the DNN for use with image data generated by cameras with varying parameters—such as varying fields of view. In some examples, a post-processing safety bounds operation may be executed on the predictions of the DNN to ensure that the predictions fall within a safety-permissible range.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: applying, to a neural network, first data representative of an image of a field of view of an image sensor and second data representative of a distortion map generated to correspond to a field-of-view of the image sensor, the neural network trained based at least in part on ground truth information generated using at least one of a LIDAR sensor or a RADAR sensor; computing, using the neural network and based at least in part on the first data and the second data, third data representative of one or more depth values corresponding to the image; determining one or more pixels of the image that correspond to an object depicted in the image; and associating, with the object, a depth value of the one or more depth values that corresponds to the one or more pixels. 2. The method of claim 1 , wherein the determining the one or more pixels includes: receiving fourth data representative of pixel locations of a bounding shape associated with the object; and determining the one or more locations using the fourth data. 3. The method of claim 1 , further comprising: computing, using the neural network and based at least in part on the first data and the second data, fourth data representative of pixel locations within the image corresponding to one or more bounding shapes, wherein the determining the one or more pixels includes determining, based at least in part on the fourth data, the one or more pixels. 4. The method of claim 3 , wherein, for each bounding shape of the one or more bounding shapes, the pixel locations correspond to a center pixel of the bounding shape and a pixel distance from the center pixel to at least one edge of the bounding shape. 5. The method of claim 4 , wherein the fourth data is further representative of confidences corresponding to a plurality of pixels of the image, the confidences corresponding to a likelihood that each of the plurality of pixels correspond to center pixels of the one or more bounding shapes, wherein the center pixel of the bounding shape is determined based at least in part on the confidences. 6. The method of claim 3 , wherein: the pixel locations correspond to two or more bounding shapes associated with the object; a density-based spatial clustering of application with noise (DBSCAN) algorithm is used to determine final pixel locations for the bounding shape corresponding to the object; and the one or more pixels of the image that correspond to the bounding shape of the object are determined based at least in part on the final pixel locations. 7. The method of claim 1 , wherein: the image sensor corresponds to a deployed camera; the neural network is further trained according to a reference camera parameter of a reference camera; a scaling factor is applied to the one or more depth values to generate one or more final depth values based on a deployed camera parameter of the deployed camera being different from the reference camera parameter; and the depth value of the one or more depth values includes a final depth value of the one or more final depth values. 8. The method of claim 7 , wherein the reference camera parameter includes a first angle of a first field-of-view (FOV) of the reference camera and the deployed camera parameter includes a second angle of a second FOV of the deployed camera. 9. The method of claim 1 , further comprising performing one or more operations for controlling an autonomous machine based at least in part on the depth value associated with the object. 10. The method of claim 1 , further comprising: based at least in part on at least one of a bounding shape of an object or a shape of a driving surface, determining at least one of a maximum depth value or a minimum depth value for the object; and one of: clamping the depth value to the maximum depth value when the depth value exceeds the maximum depth value; or clamping the depth value to the minimum depth value when the depth value is less than the maximum depth value. 11. A method comprising: receiving first data representative of LIDAR information, second data representative of RADAR information, and third data representative of an image; receiving fourth data representative of a bounding shape corresponding to an object depicted in the image; correlating, with the bounding shape, depth information determined based at least in part on at least one of the LIDAR information or the RADAR information; generating ground truth data based at least in part on converting the depth information to a depth map by: determining a representative pixel within the bounding shape; selecting, from a shape within the bounding shape centered at the representative pixel, a subset of pixels of the bounding shape; and encoding, to each of the subset of pixels, a depth value of the depth values corresponding to the representative pixel; and training a neural network to compute a predicted depth map using the ground truth data. 12. The method of claim 11 , wherein the correlating the depth information with the bounding shape includes: projecting the bounding shape and the at least one of the LIDAR information or the RADAR information into the image; and determining an overlap between the bounding shape and the at least one of the LIDAR information or the RADAR information. 13. The method of claim 11 , wherein the correlating the depth information with the bounding shape includes: generating a final bounding shape by cropping an initial bounding shape, the cropping including removing a first percentage from a top of the initial bounding shape and a second percentage different from the first percentage from a bottom of the bounding shape. 14. The method of claim 13 , wherein the first percentage is greater than the second percentage. 15. The method of claim 13 , wherein the first percentage is determined based at least in part on the depth information corresponding to the initial bounding shape such that the first percentage increases as a depth increases. 16. The method of claim 11 , wherein the correlating the depth information includes using a noisiness threshold to determine whether to use the LIDAR information or the RADAR information. 17. The method of claim 11 , wherein the shape is one of an ellipse or a circle, and the representative pixel is a center pixel of the bounding shape. 18. The method of claim 11 , further comprising: determining, based at least in part on an angle of a field-of-view of a camera that captured the image, fifth data representative of a distortion map, wherein the neural network is further trained using the fifth data. 19. The method of claim 11 , wherein the ground truth data further represents a scaling factor determined based at least in part on a field-of-view of a camera that captured the image. 20. A system comprising: an image sensor to generate first data representative of an image of an environment; a computing device including one or more processing devices and one or more memory devices communicatively coupled to the one or more processing devices and storing programmed instructions thereon that, when executed using the one or more processing devices, cause the instantiation of: a depth determiner to compute depth values using a neural network and based at least in part on the first data and second data representative of a distortion map generated to correspond to a field-of-view of the image sensor, the neural network trained using at least one of LIDAR data or RADAR data as ground truth

Assignees

Inventors

Classifications

  • using pattern recognition or machine learning (optical pattern recognition or electronic computations therefor G06V10/88) · CPC title

  • using neural networks · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title

  • Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308338B2 cover?
In various examples, a deep neural network (DNN) is trained to accurately predict, in deployment, distances to objects and obstacles using image data alone. The DNN may be trained with ground truth data that is generated and encoded using sensor data from any number of depth predicting sensors, such as, without limitation, RADAR sensors, LIDAR sensors, and/or SONAR sensors. Camera adaptation al…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).