Three-dimensional (3D) pose estimation from a monocular camera
US-10929654-B2 · Feb 23, 2021 · US
US12530788B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12530788-B2 |
| Application number | US-202017786065-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 24, 2020 |
| Priority date | Dec 27, 2019 |
| Publication date | Jan 20, 2026 |
| Grant date | Jan 20, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system includes a computing device. The computing device is configured to perform a set of functions. The set of functions includes receiving an image, wherein the image comprises a two-dimensional array of data. The set of functions includes extracting, by a two-dimensional neural network, a plurality of two-dimensional features from the two-dimensional array of data. The set of functions includes generating a linear combination of the plurality of two-dimensional features to form a single three-dimensional input feature. The set of functions includes extracting, by a three-dimensional neural network, a plurality of three-dimensional features from the single three-dimensional input feature. The set of functions includes determining a two-dimensional depth map. The two-dimensional depth map contains depth information corresponding to the plurality of three-dimensional features.
Opening claim text (preview).
What is claimed is: 1 . A system for determining a two-dimensional depth map from a single monocular image, the system comprising: a computing device, wherein the computing device is configured to perform a set of functions comprising: receiving a single monocular image, wherein the image comprises a two-dimensional array of data; extracting, by a two-dimensional neural network, a plurality of two-dimensional features from the two-dimensional array of data; generating a linear combination of the plurality of two-dimensional features to form a single three-dimensional input feature; extracting, by a three-dimensional neural network, a plurality of three-dimensional features from the single three-dimensional input feature; and determining a two-dimensional depth map, wherein the two-dimensional depth map contains depth information corresponding to the plurality of three-dimensional features. 2 . The system of claim 1 , wherein the computing device is a first computing device of a plurality of computing devices, and wherein the two-dimensional neural network and the three-dimensional neural network correspond to at least a second computing device of the plurality of computing devices. 3 . The system of claim 1 , wherein the two-dimensional neural network comprises a two-dimensional convolutional neural network, wherein the three-dimensional neural network comprises a three-dimensional convolutional neural network, and wherein extracting the plurality of two-dimensional features comprises using the two-dimensional convolutional neural network as a two-dimensional filter that operates in two directions within the two-dimensional array of data to output the plurality of two-dimensional features. 4 . The system of claim 3 , the set of functions further comprising: prior to extracting the plurality of two-dimensional features, training the two-dimensional convolutional neural network using a plurality of images representing objects such that different nodes within the two-dimensional convolutional neural network operate to output different types of two-dimensional features corresponding to different objects. 5 . The system of claim 1 , wherein generating the linear combination of the plurality of two-dimensional features to form the single three-dimensional input feature comprises: classifying a two-dimensional feature of the plurality of two-dimensional features in accordance with an object associated with training the two-dimensional convolutional neural network; and generating the linear combination of the plurality of two-dimensional features based on classifying the two-dimensional feature. 6 . The system of claim 1 , wherein the two-dimensional neural network comprises a two-dimensional convolutional neural network, wherein the three-dimensional neural network comprises a three-dimensional convolutional neural network, and wherein extracting the plurality of three-dimensional features comprises using the three-dimensional convolutional neural network as a three-dimensional filter that operates in three directions within the three-dimensional input feature to output the plurality of three-dimensional features. 7 . The system of claim 1 , wherein the two-dimensional neural network comprises a two-dimensional convolutional neural network, wherein the three-dimensional neural network comprises a three-dimensional convolutional neural network, and wherein extracting the plurality of three-dimensional features from the single three-dimensional input feature comprises extracting a plurality of sets of voxels, wherein each voxel indicates a level of opaqueness. 8 . A method for determining a two-dimensional depth map from a single monocular image, the method comprising: receiving a single monocular image, wherein the image comprises a two-dimensional array of data; extracting, by a two-dimensional neural network, a plurality of two-dimensional features from the two-dimensional array of data; generating a linear combination of the plurality of two-dimensional features to form a single three-dimensional input feature; extracting, by a three-dimensional neural network, a plurality of three-dimensional features from the single three-dimensional input feature; and determining a two-dimensional depth map, wherein the two-dimensional depth map contains depth information corresponding to the plurality of three-dimensional features. 9 . The method of claim 8 , further comprising: determining, based on the plurality of three-dimensional features extracted by the three-dimensional neural network, a three-dimensional array of voxels, each voxel of the array indicating a respective level of opaqueness, and wherein determining a two-dimensional depth map comprises, for a plurality of pixels of the two-dimensional depth map, determining respective distances between a capture device location and a respective closest opaque voxel of the array of voxels along a respective different path from the capture device location. 10 . The method of claim 8 , wherein each two-dimensional feature extracted by the two-dimensional neural network represents a respective different objects in the image. 11 . The method of claim 10 , wherein generating the linear combination of the plurality of two-dimensional features to form the single three-dimensional input feature comprises ordering the two-dimensional features based on overlap between the respective different objects in the image. 12 . The method of claim 8 , wherein the two-dimensional neural network comprises a two-dimensional convolutional neural network, wherein the three-dimensional neural network comprises a three-dimensional convolutional neural network, and wherein extracting the plurality of two-dimensional features comprises using the two-dimensional convolutional neural network as a two-dimensional filter that operates in two directions within the two-dimensional array of data to output the plurality of two-dimensional features. 13 . The method of claim 12 , further comprising: prior to extracting the plurality of two-dimensional features, training the two-dimensional convolutional neural network using a plurality of images representing objects such that different nodes within the two-dimensional convolutional neural network operate to output different types of two-dimensional features corresponding to different objects. 14 . The method of claim 13 , wherein generating the linear combination of the plurality of two-dimensional features to form the single three-dimensional input feature comprises: classifying a two-dimensional feature of the plurality of two-dimensional features in accordance with an object associated with training the two-dimensional convolutional neural network; and generating the linear combination of the plurality of two-dimensional features based on classifying the two-dimensional feature. 15 . The method of claim 8 , wherein the two-dimensional neural network comprises a two-dimensional convolutional neural network, wherein the three-dimensional neural network comprises a three-dimensional convolutional neural network, and wherein extracting the plurality of three-dimensional features comprises using the three-dimensional convolutional neural network as a three-dimensional filter that operates in three directions within the three-dimensional input feature to output the plurality of three-dimensional features. 16 . The method of claim 8 , wherein extracting the plurality of three-dimensional features from the single three-dimensional input feature comprises extracting a plurality of sets of voxels, wherein ea
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Range image; Depth image; 3D point clouds · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.