What technology area does this patent fall under?

Primary CPC classification G06V10/82. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 01 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Depth and motion estimations in machine learning environments

US11024041B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11024041-B2
Application number	US-201816215348-A
Country	US
Kind code	B2
Filing date	Dec 10, 2018
Priority date	Dec 10, 2018
Publication date	Jun 1, 2021
Grant date	Jun 1, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mechanism is described for facilitating depth and motion estimation in machine learning environments, according to one embodiment. A method of embodiments, as described herein, includes receiving a frame associated with a scene captured by one or more cameras of a computing device; processing the frame using a deep recurrent neural network architecture, wherein processing includes simultaneously predicating values associated with multiple loss functions corresponding to the frame; and estimating depth and motion based the predicted values.

First claim

Opening claim text (preview).

What is claimed is: 1. At least one non-transitory machine-readable medium comprising instructions which, when executed by a computing device, cause the computing device to perform operations comprising: receiving a frame associated with a scene captured by one or more cameras; processing the frame using a deep recurrent neural network architecture, wherein processing includes simultaneously predicating values associated with multiple loss functions corresponding to the frame; and estimating depth and motion based on the predicted values. 2. The non-transitory machine-readable medium of claim 1 , wherein the simultaneously predicted values comprise two or more of pixel depth, pixel velocity, pixel class and segmentation, and pixel optical flow. 3. The non-transitory machine-readable medium of claim 1 , wherein the deep recurrent neural network architecture is further to receive and process one or more of one or more previous frames for convolutional long short-term memory (LSTM) and odometry-based translation length, wherein the deep recurrent neural network architecture includes one or more of a deep recurrent neural network and one or more convolutional LSTM layers. 4. The non-transitory machine-readable medium of claim 1 , wherein the operations further comprise fusing together the multiple loss functions associated with the simultaneously predicted values in a cumulated cost function, wherein fusing is performed using the deep recurrent neural network architecture. 5. The non-transitory machine-readable medium of claim 1 , wherein the operations further comprise estimating, based on the deep recurrent neural network architecture, rotation matrixes and translation vectors for an object in the scene and each of the one or more cameras, wherein rotation matrixes and the translation vectors are fused together in a supervised form. 6. The non-transitory machine-readable medium of claim 1 , wherein the operations further comprise constraining, based on odometry information, one or more of the rotation matrixes, translation vectors, and the simultaneously predicted values to estimate the depth of the scene and the motion of the one or more cameras. 7. The non-transitory machine-readable medium of claim 1 , wherein the computing device comprises one or more processors comprising one or more of a graphics processor and an application processor, wherein the one or more processors are co-located on a common semiconductor package. 8. A method comprising: receiving a frame associated with a scene captured by one or more cameras of a computing device; processing the frame using a deep recurrent neural network architecture, wherein processing includes simultaneously predicating values associated with multiple loss functions corresponding to the frame; and estimating depth and motion based on the predicted values. 9. The method of claim 8 , wherein the simultaneously predicted values comprise two or more of pixel depth, pixel velocity, pixel class and segmentation, and pixel optical flow. 10. The method of claim 8 , wherein the deep recurrent neural network architecture is further to receive and process one or more of one or more previous frames for convolutional long short-term memory (LSTM) and odometry-based translation length, wherein the deep recurrent neural network architecture includes one or more of a deep recurrent neural network and one or more convolutional LSTM layers. 11. The method of claim 8 , further comprising fusing together the multiple loss functions associated with the simultaneously predicted values in a cumulated cost function, wherein fusing is performed using the deep recurrent neural network architecture. 12. The method of claim 8 , further comprising estimating, based on the deep recurrent neural network architecture, rotation matrixes and translation vectors for an object in the scene and each of the one or more cameras, wherein rotation matrixes and the translation vectors are fused together in a supervised form. 13. The method of claim 8 , further comprising constraining, based on odometry information, one or more of the rotation matrixes, translation vectors, and the simultaneously predicted values to estimate the depth of the scene and the motion of the one or more cameras, wherein the computing device comprises one or more processors comprising one or more of a graphics processor and an application processor, wherein the one or more processors are co-located on a common semiconductor package. 14. An apparatus comprising: one or more processors to: receive a frame associated with a scene captured by one or more cameras of a computing device; process the frame using a deep recurrent neural network architecture, wherein processing includes simultaneously predicating values associated with multiple loss functions corresponding to the frame; and estimate depth and motion based on the predicted values. 15. The apparatus of claim 14 , wherein the simultaneously predicted values comprise two or more of pixel depth, pixel velocity, pixel class and segmentation, and pixel optical flow. 16. The apparatus of claim 14 , wherein the deep recurrent neural network architecture is further to receive and process one or more of one or more previous frames for convolutional long short-term memory (LSTM) and odometry-based translation length, wherein the deep recurrent neural network architecture includes one or more of a deep recurrent neural network and one or more convolutional LSTM layers. 17. The apparatus of claim 14 , wherein the one or more processors are further to fuse together the multiple loss functions associated with the simultaneously predicted values in a cumulated cost function, wherein fusing is performed using the deep recurrent neural network architecture. 18. The apparatus of claim 14 , wherein the one or more processors are further to estimate, based on the deep recurrent neural network architecture, rotation matrixes and translation vectors for an object in the scene and each of the one or more cameras, wherein rotation matrixes and the translation vectors are fused together in a supervised form. 19. The apparatus of claim 14 , wherein the one or more processors are further to constrain, based on odometry information, one or more of the rotation matrixes, translation vectors, and the simultaneously predicted values to estimate the depth of the scene and the motion of the one or more cameras. 20. The apparatus of claim 14 , wherein the computing device comprises one or more processors comprising one or more of a graphics processor and an application processor, wherein the one or more processors are co-located on a common semiconductor package. 21. A data processing system comprising: memory; one or more processors coupled to the memory, the one or more processors to: receive a frame associated with a scene captured by one or more cameras of a computing device; process the frame using a deep recurrent neural network architecture, wherein processing includes simultaneously predicating values associated with multiple loss functions corresponding to the frame; and estimate depth and motion based on the predicted values. 22. The data processing system of claim 21 , wherein the simultaneously predicted values comprise two or more of pixel depth, pixel velocity, pixel class and segmentation, and pixel optical flow, wherein the deep recurrent neural network architecture is further to receive and process one or more of one or more previous frames for convolu

Assignees

Intel Corp

Inventors

Classifications

G06V10/80
Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level (multimodal speaker identification or verification G10L17/10) · CPC title
G06V10/82Primary
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06T7/251Primary
involving models · CPC title
G06F18/25
Fusion techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 66171144

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11024041B2 cover?: A mechanism is described for facilitating depth and motion estimation in machine learning environments, according to one embodiment. A method of embodiments, as described herein, includes receiving a frame associated with a scene captured by one or more cameras of a computing device; processing the frame using a deep recurrent neural network architecture, wherein processing includes simultaneou…
Who is the assignee on this patent?: Intel Corp
What technology area does this patent fall under?: Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 01 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Scene flow estimation using shared features

Learning rigidity of dynamic scenes for three-dimensional scene flow estimation

Semantic video encoding

Method, device, and non-transitory computer readable storage medium for image processing

Frequently asked questions