Who is the assignee on this patent?

Microsoft Corp, Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06T7/337. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Real-time three-dimensional reconstruction of a scene from a single camera

US9779508B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9779508-B2
Application number	US-201414226732-A
Country	US
Kind code	B2
Filing date	Mar 26, 2014
Priority date	Mar 26, 2014
Publication date	Oct 3, 2017
Grant date	Oct 3, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A combination of three computational components may provide memory and computational efficiency while producing results with little latency, e.g., output can begin with the second frame of video being processed. Memory usage may be reduced by maintaining key frames of video and pose information for each frame of video. Additionally, only one global volumetric structure may be maintained for the frames of video being processed. To be computationally efficient, only depth information may be computed from each frame. Through fusion of multiple depth maps from different frames into a single volumetric structure, errors may average out over several frames, leading to a final output with high quality.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: a camera tracking circuit comprising an input that receives a sequence of images from a single camera at an input rate, the camera tracking circuit being operative to process each current image from the sequence of images at the input rate, and comprising a first output providing a pose for the current image, and a second output providing the current image for storing in a storage designating the current image as a key frame based on the pose for the image and poses for other stored key frames from the storage, and a third output providing an indication of one or more key frames other than the current image from the storage selected for the current image, wherein the pose, the indication of the one or more selected key frames and the designation of the current image as a key frame, are output successively at the input rate for each current image; a depth map estimation circuit having a first input that receives the current image from the sequence of images, and the indication of the one or more selected key frames for the current image, and a second input receiving the selected one or more key frames from the storage, and having an output providing a depth map for the current image based on the current image and the one or more selected key frames; and a volumetric fusion circuit having an input receiving the depth map for the current image and an output providing a three-dimensional model as a fusion of depth maps received from the depth map estimation circuit for the sequence of images; wherein the depth map estimation circuit and the volumetric fusion circuit further process inputs and provide outputs successively for each current image at the input rate. 2. The device of claim 1 , wherein the camera tracking circuit, the depth map estimation circuit, and the volumetric fusion circuit are implemented within a single general purpose integrated circuit. 3. The device of claim 1 , wherein the depth map for an image comprises a measure of depth for each pixel in the current image. 4. The device of claim 1 , wherein the pose for the current image comprises rotation and translation of the camera with respect to a fixed coordinate system. 5. The device of claim 1 , wherein the three-dimensional model is defined in a virtual volume, wherein an initial pose of the camera is defined at a point in the virtual volume. 6. The device of claim 1 , wherein the three-dimensional model is defined by representing a surface using a signed distance field in the virtual volume. 7. The device of claim 1 , wherein the volumetric fusion circuit outputs the three-dimensional model after processing as few as the first two images from the sequence of images. 8. A process for generating a three-dimensional model from a sequence of images from a single camera, the process comprising: receiving the sequence of images from the camera into memory at an input rate, to successively provide at the input rate a current image in memory for processing; successively processing the current image in the memory at the input rate by repeating the following steps for each current image: determining, using logic circuitry, a pose for the current image; determining, using logic circuitry, whether the current image is to be designated as a key frame based on the pose of the current image and poses for other stored key frames from storage; in response to determining that the current image is to be designated as a key frame, storing, the current image and the pose for the current image in the storage as a key frame; selecting, using logic circuitry, a key frame other than the current image, from among the other key frames stored in the storage from the sequence of images; computing, using logic circuitry, a depth map for the current image using the current image and the selected key frame for the current image; merging, in memory, the depth map computed for the current image into a volumetric representation of a scene represented in the sequence of images; and outputting, at the input rate, the volumetric representation of the scene from the memory. 9. The process of claim 8 , further comprising determining a scale of depth by a sensor. 10. The process of claim 8 , wherein the depth map determined for the current image comprises measures of depth for pixels in the current image. 11. The process of claim 8 , wherein the pose for the current image comprises rotation and translation of the camera with respect to a fixed coordinate system. 12. The process of claim 8 , wherein the three-dimensional model is defined in a virtual volume, wherein an initial pose of the camera is defined at a point in the virtual volume. 13. The process of claim 8 , wherein the three-dimensional model is defined by representing a surface using a signed distance field in the virtual volume. 14. The process of claim 8 , wherein outputting the three-dimensional model begins after processing as few as a first two images from the sequence of images. 15. A computer program product, comprising: a computer storage device; computer program instructions stored in the computer storage device that when read from the storage device and processed by a processor of a computer instruct the computer to perform a process for generating a three-dimensional model from a sequence of images from a single camera, the process comprising: receiving the sequence of images from the camera into memory at an input rate, to successively provide at the input rate a current image in memory for processing; successively processing the current image in the memory at the input rate by repeating the following steps for each current image: determining, using logic circuitry, a pose for the current image; determining, using logic circuitry, whether the current image is to be designated as a key frame based on the pose of the current image and poses for other stored key frames from storage; in response to determining that the current image is to be designated as a key frame, storing, the current image and the pose for the current image in the storage as a key frame; selecting, using logic circuitry, a key frame other than the current image, from among the other key frames stored in the storage from the sequence of images; computing, using logic circuitry, a depth map for the current image using the current image and the selected key frame for the current image; merging, in memory, the depth map computed for the current image into a volumetric representation of a scene represented in the sequence of images; and outputting, at the input rate, the volumetric representation of the scene from the memory. 16. The computer program product of claim 15 , wherein the processor is housed in a mobile device that incorporates the camera. 17. The computer program product of claim 15 , wherein the depth map for the current image comprises measures of depth for pixels in the current image. 18. The computer program product of claim 15 , wherein the pose for the current image comprises rotation and translation of the camera with respect to a fixed coordinate system. 19. The computer program product of claim 15 , wherein the three-dimensional model is defined in a virtual volume, wherein an initial pose of the camera is defined at a point in the virtual volume. 20. The computer program product of claim 15 , wherein the three-dimensional model is defined by representing a surface using a signed distance field in the virtual volume.

Assignees

Inventors

Classifications

G06T7/337Primary
involving reference images or patches · CPC title
G06T17/00Primary
Three-dimensional [3D] modelling for computer graphics · CPC title
G06T2207/10016
Video; Image sequence · CPC title
G06T7/593
from stereo images · CPC title
G06T2207/30244
Camera pose · CPC title

Patent family

Related publications grouped by family.

View patent family 54191137

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9779508B2 cover?: A combination of three computational components may provide memory and computational efficiency while producing results with little latency, e.g., output can begin with the second frame of video being processed. Memory usage may be reduced by maintaining key frames of video and pose information for each frame of video. Additionally, only one global volumetric structure may be maintained for the…
Who is the assignee on this patent?: Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06T7/337. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 03 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).