What technology area does this patent fall under?

Primary CPC classification G06N3/084. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Leveraging multidimensional sensor data for computationally efficient object detection for autonomous machine applications

US11468582B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11468582-B2
Application number	US-202016818860-A
Country	US
Kind code	B2
Filing date	Mar 13, 2020
Priority date	Mar 16, 2019
Publication date	Oct 11, 2022
Grant date	Oct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various examples, a two-dimensional (2D) and three-dimensional (3D) deep neural network (DNN) is implemented to fuse 2D and 3D object detection results for classifying objects. For example, regions of interest (ROIs) and/or bounding shapes corresponding thereto may be determined using one or more region proposal networks (RPNs)—such as an image-based RPN and/or a depth-based RPN. Each ROI may be extended into a frustum in 3D world-space, and a point cloud may be filtered to include only points from within the frustum. The remaining points may be voxelated to generate a volume in 3D world space, and the volume may be applied to a 3D DNN to generate one or more vectors. The one or more vectors, in addition to one or more additional vectors generated using a 2D DNN processing image data, may be applied to a classifier network to generate a classification for an object.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: computing, using a machine learning model and based at least in part on image data representative of an environment, first data representative of a location in image-space of a bounding shape corresponding to an object within the environment; generating a frustum in world-space based at least in part on the location in image-space of the bounding shape; receiving sensor data representative of depth information within the environment; determining a subset of the sensor data corresponding to an area of the environment within the frustum; voxelizing the subset of the sensor data to generate voxelated sensor data; applying the voxelated sensor data to a first neural network; computing, using the first neural network and based at least in part on the voxelated sensor data, second data representative of one or more vectors; and applying the one or more vectors to a classifier network to determine a classification for the object. 2. The method of claim 1 , wherein the generating the frustum is based at least in part on at least one of intrinsic parameters or extrinsic parameters of a camera that generated the image data. 3. The method of claim 1 , further comprising: converting the depth information from world-space to image-space to generate third data representative of a depth map; applying the third data to another machine learning model; and computing, using the another machine learning model and based at least in part on the third data, fourth data representative of another location in image space of another bounding shape corresponding to the object, wherein the generating the frustum is further based at least in part on the another image-space location of the another bounding shape. 4. The method of claim 3 , further comprising: generating a final location for a final bounding shape corresponding to the object based at least in part on the location in image space and the another image-space location, wherein the generating the frustum is using the final location. 5. The method of claim 1 , wherein the machine learning model is a region-proposal network (RPN), and the computing the first data representative of the location in image space of the bounding shape corresponding to the object within the environment includes: computing, using the RPN, third data representative of region proposals and fourth data representative of a confidence for each region proposal that the region proposal corresponds to the object; and selecting the bounding shape based at least in part on the region proposals and the confidence for each region proposal. 6. The method of claim 5 , further comprising: applying the third data representative of the region proposals to a two-dimensional (2D) neural network; computing, using the 2D neural network and based at least in part on the third data, fourth data representative of one or more additional vectors, wherein the applying the one or more vectors further includes applying the one or more additional vectors. 7. The method of claim 5 , further comprising: computing, using a 2D neural network and based at least in part on the third data, fourth data representative of one or more additional vectors; applying the second data and the fourth data to one or more fully connected layers of the classifier network; and computing, using the one or more fully connected layers and based at least in part on the second data and the fourth data, a final classification of the object. 8. The method of claim 1 , wherein the first neural network is a three-dimensional (3D) neural network, the method further comprising computing, using the 3D neural network and based at least in part on the voxelated sensor data, third data representative of a world-space location of the object. 9. A method comprising: generating a three-dimensional (3D) frustum at least partially defined by a two-dimensional (2D) bounding shape corresponding to an object in an environment; receiving sensor data generated by a depth sensor and representative of a point cloud corresponding to the object in the environment; generating a filtered point cloud based at least in part on filtering out portions of the point cloud not within the frustum; voxelizing each point of the filtered point cloud to generate a volume corresponding to the filtered point cloud; computing, using a 3D neural network and based at least in part on the volume, first data representative of one or more first vectors; computing, using a 2D neural network and based at least in part on image data representative of the environment, second data representative of one or more second vectors; and computing, based at least in part on the first data and the second data, a classification of the object. 10. The method of claim 9 , wherein the computing the classification includes: generating a combined vector from the one or more first vectors and the one or more second vectors; and applying third data representative of the combined vector to one or more fully connected layers. 11. The method of claim 9 , further comprising: applying the image data to another neural network; and computing, using the another neural network, a location of the 2D bounding shape, wherein the generating the 3D frustum is at least partially defined by the location of the 2D bounding shape. 12. The method of claim 9 , wherein the depth sensor includes one or more of a LIDAR sensor, a RADAR sensor, a SONAR sensor, or an ultrasonic sensor. 13. The method of claim 9 , wherein the computing the second data includes: computing, using an RPN and based at least in part on the image data, third data representative of region proposals and fourth data representative of feature maps; applying the third data and the fourth data to the 2D neural network; and computing, using the 2D neural network, the second data. 14. The method of claim 9 , wherein the 2D neural network includes at least one 2D convolutional layer and the 3D neural network includes at least one 3D convolutional layer. 15. The method of claim 9 , further comprising: determining a first location of the 2D bounding shape; rendering the point cloud in image space to generate third data representative of a depth map; applying the third data to a machine learning model; computing, using the machine learning model and based at least in part on the third data, fourth data representative of a second location of another bounding shape corresponding to the object; and determining a final location of a final bounding shape based at least in part on the first location and the second location, wherein the generating the 3D frustum is further based at least in part on the final location. 16. A system comprising: an image sensor to generate image data representative of a field of view of the image sensor in an environment; a depth sensor to generate sensor data representative of depth information corresponding to a sensory field of the depth sensor within the environment; a computing device including one or more processing devices and one or more memory devices communicatively coupled to the one or more processing devices storing programmed instructions thereon, which when executed by the one or more processing devices causes instantiation of: a frustum generator to generate a three-dimensional (3D) frustum at least partially defined by a two-dimensional (2D) bounding shape corresponding to an object in the environment; a sensor data filter to generate filtered sensor data based at least in part on filtering out portions of the sensor da

Assignees

Nvidia Corp

Inventors

Classifications

G06N3/084Primary
Backpropagation, e.g. using gradient descent · CPC title
G06V10/80
Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level (multimodal speaker identification or verification G10L17/10) · CPC title
G06V20/58
Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads · CPC title
G06V10/454
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06V10/25
Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title

Patent family

Related publications grouped by family.

View patent family 72423881

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468582B2 cover?: In various examples, a two-dimensional (2D) and three-dimensional (3D) deep neural network (DNN) is implemented to fuse 2D and 3D object detection results for classifying objects. For example, regions of interest (ROIs) and/or bounding shapes corresponding thereto may be determined using one or more region proposal networks (RPNs)—such as an image-based RPN and/or a depth-based RPN. Each ROI ma…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Robotic workspace layout planning

Method for programmable timeouts of tree traversal mechanisms in hardware

Enhancing robot learning

Neural style transfer for image varietization and recognition

Frequently asked questions