Camera/object pose from predicted coordinates

US9940553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9940553-B2
Application numberUS-201313774145-A
CountryUS
Kind codeB2
Filing dateFeb 22, 2013
Priority dateFeb 22, 2013
Publication dateApr 10, 2018
Grant dateApr 10, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Camera or object pose calculation is described, for example, to relocalize a mobile camera (such as on a smart phone) in a known environment or to compute the pose of an object moving relative to a fixed camera. The pose information is useful for robotics, augmented reality, navigation and other applications. In various embodiments where camera pose is calculated, a trained machine learning system associates image elements from an image of a scene, with points in the scene's 3D world coordinate frame. In examples where the camera is fixed and the pose of an object is to be calculated, the trained machine learning system associates image elements from an image of the object with points in an object coordinate frame. In examples, the image elements may be noisy and incomplete and a pose inference engine calculates an accurate estimate of the pose.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of calculating pose of an entity comprising: receiving, at a processor, at least one image where the image is of a scene captured by an entity comprising a mobile camera; applying image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determining whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and based on a determination that the pose of the entity has not been calculated, calculating an initial pose of the entity from the plurality of associations and the optimized function; and generating map display data based at least in part on the initial pose of the entity, wherein the energy function comprises: E ( H )=Σ iϵ1 ρ(min mϵM i ∥m−Hx i ∥ 2 ) wherein id is an index of the image elements, ρ is an error function, mϵM i represents the predicted 3D points in the scene space, x i are the 3D coordinates in the camera space, and H is the pose of the entity. 2. A method as claimed in claim 1 , further comprising calculating the initial pose of the entity as parameters having six degrees of freedom, three indicating rotation of the entity and three indicating position of the entity. 3. A method as claimed in claim 1 , the machine learning system having been trained using images with image elements labeled with scene coordinates. 4. A method as claimed in claim 1 , wherein the machine learning system comprises a plurality of trained random forests and the method further comprises: applying the image elements of the at least one image to the plurality of trained random forests, the trained random forests having been trained using images from a different one of a plurality of scenes; and calculating which of the scenes the mobile camera was in when the at least one image was captured. 5. A method as claimed in claim 1 , wherein the machine learning system is trained using images of a plurality of scenes with image elements labeled with scene identifiers and labeled with scene coordinates of points in the scene the image elements depict. 6. A method as claimed in claim 1 , further comprising calculating the pose by searching amongst a set of possible pose candidates and using samples of the plurality of associations between image elements and points to assess the set of possible pose candidates. 7. A method as claimed in claim 1 , further comprising receiving at the processor, a stream of images, and calculating the pose by searching amongst a set of possible pose candidates which includes a second pose calculated from another image in the stream. 8. A method as claimed in claim 1 at least partially carried out using hardware logic selected from one or more of the following: a field-programmable gate array, a program-specific integrated circuit, a program-specific standard product, a system-on-a-chip, a complex programmable logic device, and a graphics processing unit. 9. A method as claimed in claim 1 , wherein the entity is a mobile camera and the pose of the mobile camera is calculated, the method further comprising accessing a 3D model of the scene and refining the pose of the mobile camera using the accessed 3D model. 10. A pose tracker comprising: a processor arranged to: receive at least one image of a scene captured by an entity comprising a mobile camera; and apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between image elements and three-dimensional (3D) points in a scene space; and a pose inference engine arranged to: optimize an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determine whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refining the pose of the entity from the plurality of associations and the optimized function; and based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the mobile camera from the plurality of associations, the calculation being based at least in part on the optimized function; wherein the energy function comprises: E ( H )=Σ iϵ1 ρ(min mϵM i ∥m−Hx i ∥ 2 ) wherein iϵI is an index of the image elements, ρ is an error function, mϵM i represents the predicted 3D points in the scene space, x i are the 3D coordinates in the camera space, and H is the pose of the entity. 11. The pose tracker as claimed in claim 10 , the pose inference engine further arranged to calculate the initial pose by searching amongst a set of possible pose candidates and using samples of the plurality of associations between image elements and points in scene coordinates to assess the set of possible pose candidates. 12. The pose tracker as claimed in claim 10 , the processor further arranged to receive a stream of images, and the pose tracker further comprising a pose inference engine arranged to calculate the initial pose by searching amongst a set of possible pose candidates which includes a second pose calculated from another image in the stream of images. 13. The pose tracker as claimed in claim 10 at least partially implemented using hardware logic selected from one or more of the following: a field-programmable gate array, a program-specific integrated circuit, a program-specific standard product, a system-on-a-chip, a complex programmable logic device, and a graphics processing unit. 14. The method as claimed in claim 1 , further comprising prior to applying the image elements, removing a set of image elements that are spurious or noisy image elements. 15. One or more computer-readable storage devices having computer-executable instructions that when executed by a processor, cause the processor to: receive at least one image that is of a scene captured by an entity comprising a mobile camera; apply image elements of the at least one image to a trained machine learning system to obtain a plurality of associations between a set of image elements and three dimensional (3D) points in a scene space, the trained machine learning system optimizing an energy function comprising the 3D points in the scene space predicted by at least one tree in at least one random decision forest and 3D coordinates in camera space; determine whether a pose of the entity has been calculated; based on a determination that the pose has been calculated, refine the pose of the entity from the plurality of associations and the optimized function; based on a determination that the pose of the entity has not been calculated, calculate an initial pose of the entity from the plurality of associations and the optimized function; and generate map display data based at least in part on the initial pose of the entity; wherein the energy function comprises: E ( H )=Σ iϵ1 ρ(min mϵM i ∥m−Hx i ∥ 2 ) wherein iϵI is an index of the image elements, ρ is an error function, mϵM i represents the predicted 3D points in the scene space, x i are the 3D coordinates in the camera space, and H is the pose of the entity. 1

Assignees

Inventors

Classifications

  • Hierarchical techniques, i.e. dividing or merging patterns to obtain a tree-like representation; Dendograms · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • using classification, e.g. of video objects · CPC title

  • G06V20/20Primary

    in augmented reality scenes · CPC title

  • Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9940553B2 cover?
Camera or object pose calculation is described, for example, to relocalize a mobile camera (such as on a smart phone) in a known environment or to compute the pose of an object moving relative to a fixed camera. The pose information is useful for robotics, augmented reality, navigation and other applications. In various embodiments where camera pose is calculated, a trained machine learning sys…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 10 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).