Object detection and detection confidence suitable for autonomous driving

US11210537B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11210537-B2
Application numberUS-201916277895-A
CountryUS
Kind codeB2
Filing dateFeb 15, 2019
Priority dateFeb 18, 2018
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, detected object data representative of locations of detected objects in a field of view may be determined. One or more clusters of the detected objects may be generated based at least in part on the locations and features of the cluster may be determined for use as inputs to a machine learning model(s). A confidence score, computed by the machine learning model(s) based at least in part on the inputs, may be received, where the confidence score may be representative of a probability that the cluster corresponds to an object depicted at least partially in the field of view. Further examples provide approaches for determining ground truth data for training object detectors, such as for determining coverage values for ground truth objects using associated shapes, and for determining soft coverage values for ground truth objects.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: applying, to a first neural network, sensor data representative of a frame that depicts a field of view of at least one sensor of a vehicle in an environment; receiving, from the first neural network, detected object data representative of detections of an object and locations of the detections in the frame, the detections corresponding to an output class of the first neural network; clustering the detections into one or more clusters based at least in part on distances between the locations; determining features of a cluster of the one or more clusters for use as inputs of a second neural network; receiving output data representative of a confidence score computed by the second neural network based at least in part on the inputs, the confidence score representative of a probability that the cluster corresponds to the object in the frame and that the cluster represents a positive detection of the output class in the frame. 2. The method of claim 1 , further comprising: determining at least a first detection and a second detection are a same object depicted across sequential frames represented by the sensor data; and computing at least one value of the object based at least in part on the first detection and the second detection, wherein at least one of the features corresponds to the at least one value based at least in part on the cluster being associated with the same object. 3. The method of claim 1 , further comprising: determining, from the cluster, a bounding shape of a region of the object in the frame from a plurality of the locations that correspond to a plurality of the detections in the cluster: computing a statistic corresponding to at least the plurality of the locations and a quantity of at least the plurality of the detections, wherein at least a first of the features corresponds to the statistic, at least a second of the features corresponds to one or more dimensions of the bounding shape, and the confidence score is further representative of a probability the bounding shape and the region corresponds to the object in the frame. 4. The method of claim 1 , wherein one or more of the features is based at least in part on vehicle state data representative of a state of the vehicle based at least in part on additional sensor data received from one or more of the at least one sensor or at least one alternative sensor. 5. The method of claim 1 , wherein the detections of the cluster comprise a detected object region, and one or more of the features is based at least in part on computing a statistic of one or more of input pixels to the first neural network used to determine at least one of: the detected object data, or features of at least one layer of the first neural network. 6. The method of claim 1 , wherein the clustering is based at least in part on coverage values of the detections, each coverage value indicating a likelihood a detection of the detections corresponds to the object depicted in the field of view, and the method further comprises computing a statistical value corresponding to an average of the coverage values, wherein the statistical value is included in the features of the cluster provided as the inputs to the second neural network. 7. A method comprising: determining, based at least in part on sensor data representative of a frame that depicts a field of view of at least one sensor, detected object data representative of locations of detections of an object in the frame, the detections corresponding to an output class of one or more machine learning models; generating a cluster of the detections based at least in part on di stances between the locations; determining features of the cluster for use as inputs to a neural network; and receiving output data representative of a confidence score computed by the neural network based at least in part on the inputs, the confidence score representative of a probability that the cluster corresponds to the object in the frame and that the cluster represents a positive detection of the output class in the frame. 8. The method of claim 7 , wherein the neural network is a multi-layer perceptron neural network. 9. The method of claim 7 , wherein the locations of the detections are represented by outputs of a convolutional neural network that determines the locations based at least in part on the sensor data. 10. The method of claim 7 , wherein the at least one sensor is of a vehicle and one or more of the features is based at least in part on distance data representative of a distance of the vehicle from the object, the distance data based at least in part on additional sensor data received from one or more of the at least one sensor or at least one alternative sensor of the vehicle. 11. The method of claim 7 , wherein at least one of the features is based at least in part on coverage values of the detections of the cluster, each coverage value indicating, for a detection of the detections, a likelihood the detection corresponds to an object depiction in the field of view. 12. The method of claim 7 , wherein at least one of the features represents a height of a detected object region that corresponds to the detections of the cluster, a width of the detected object region, or a central location of the detected object region. 13. The method of claim 7 , wherein one or more of the features is based at least in part one at least one estimated parameter of a ground plane in the field of view. 14. The method of claim 7 , wherein the detections of the cluster are detected in a same frame representing the field of view. 15. The method of claim 7 , wherein the generating the cluster of the detections comprises clustering the detections into a plurality of clusters using a clustering algorithm that is based at least on similarities between the locations and the cluster is of the plurality of clusters. 16. The method of claim 7 , wherein the neural network is a second neural network and the detected object data is output from a first neural network that is trained to output multiple detections for a same object within a frame of input data. 17. A computer-implemented system comprising: one or more processors; and one or more memory devices that store instructions that, when executed by the one or more processors, cause the one or more processors to execute operations comprising: grouping detections of an object in a field of view of at least one sensor into an aggregated detection of the object that corresponds to a plurality of the detections based at least on locations of the plurality of the detections, the detections corresponding to an output class of one or more machine learning models; determining, from the locations of the plurality of the detections, a bounding shape of the aggregated detection; determining a feature of the aggregated detection from the plurality of the detections for use as an input of a neural network; and receiving output data representative of a confidence score computed by the neural network based at least in part on the input, the confidence score representative of a probability that the bounding shape corresponds to the object within the field of view and that the aggregated detection is a positive detection of the output class in the frame. 18. The system of claim 17 , wherein each of the detections correspond to a same frame of the field of view and the probability is that the bounding shape corresponds to the object within the same frame of the field of view. 19. The system of claim 17 ,

Assignees

Inventors

Classifications

  • G01S7/417Primary

    involving the use of neural networks · CPC title

  • using clustering, e.g. of similar faces in social networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • of vehicle lights or traffic lights · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210537B2 cover?
In various examples, detected object data representative of locations of detected objects in a field of view may be determined. One or more clusters of the detected objects may be generated based at least in part on the locations and features of the cluster may be determined for use as inputs to a machine learning model(s). A confidence score, computed by the machine learning model(s) based at …
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G01S7/417. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).