Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06V20/64. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Aligning input image data with model input data to generate image annotations

US11514648B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11514648-B2
Application number	US-202017133493-A
Country	US
Kind code	B2
Filing date	Dec 23, 2020
Priority date	Dec 23, 2020
Publication date	Nov 29, 2022
Grant date	Nov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is suitable for reliably training a DNN object detection architecture. Exemplary object annotations that the image data annotation system can automatically apply to individual image frames include, inter alia, object pose, image pose, object masks, 3D bounding boxes composited over the physical object, 2D bounding boxes composited over the physical object, and/or depth map information. Annotating the individual image frames may be accomplished by aligning the 3D model of the physical object with a multi-view reconstruction of the physical object that is generated by inputting an image sequence into a Structure-from-Motion and/or Multi-view Stereo pipeline.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving image input data that includes a plurality of image frames that graphically represent multiple viewpoints of a physical object within a real-world environment; accessing model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generating a multi-view reconstruction of the physical object based on the multiple viewpoints of the physical object that are graphically represented within the plurality of image frames; receiving a first pose trajectory that corresponds to generation of the plurality of image frames that graphically represent the multiple viewpoints of the physical object within the real-world environment; receiving a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints of the physical object; determining a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generating, in association with individual image frames of the plurality of image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction, of the physical object, to the 3D model of the physical object; and generating, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 2. The computer-implemented method of claim 1 , wherein determining the scale corresponding to the multi-view reconstruction comprises aligning the first pose trajectory to the second pose trajectory. 3. The computer-implemented method of claim 1 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 4. The computer-implemented method of claim 1 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 5. The computer-implemented method of claim 1 , wherein the object annotations that represent the orientation or the location of the physical object within the real-world environment include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 6. The computer-implemented method of claim 1 , wherein the plurality of image frames are a plurality of depth images that define individual depth values for individual pixels of a pixel array. 7. A system comprising: one or more processing units; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the one or more processing units, cause the one or more processing units to: receive image input data that includes a sequence of image frames that represent a physical object, within a real-world environment, from multiple viewpoints; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the multiple viewpoints; receive a first pose trajectory that corresponds to generation of the sequence of image frames that represent the physical object from the multiple viewpoints; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the 3D model in association with individual image frames of the sequence of image frames; generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object, or an orientation of the physical object. 8. The system of claim 7 , wherein the multiple viewpoints of the physical object are graphically represented within the plurality of image frames, wherein the individual alignments are determined based on orientations of the multi-view reconstruction, of the physical object, associated with the individual image frames. 9. The system of claim 8 , wherein the computer-executable instructions further cause the one or more processing units to: identify the model input data that defines the 3D model of the physical object based on the multi-view reconstruction of the physical object. 10. The system of claim 8 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 11. The system of claim 7 , wherein the individual image frames of the sequence of image frames are depth images that define individual depth values for individual pixels of a pixel array. 12. The system of claim 7 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 13. The system of claim 7 , wherein the object annotations that represent the orientation or the location of the physical object include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 14. A computer-readable storage media having instructions stored thereupon which, when executed by a processor, cause a computing device to: receive image input data that includes image frames showing a physical object within a real-world environment; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the image input data; receive a first pose trajectory that corresponds to generation of the image frames; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, in association with individual ones of the image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction to the 3D model; and generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 15. The computer-readable storage media of claim 14 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on multiple viewpoints. 16. The c

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06N3/045
Combinations of networks · CPC title
G06T2219/2004
Aligning objects, relative positioning of parts · CPC title
G06T2207/10028
Range image; Depth image; 3D point clouds · CPC title
G06T7/246
using feature-based methods, e.g. the tracking of corners or segments · CPC title

Patent family

Related publications grouped by family.

View patent family 78957370

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11514648B2 cover?: An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is sui…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/64. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Outputting warped images from captured video data

Localization of a surveying instrument

Resilient Dynamic Projection Mapping System and Methods

Ar-enabled labeling using aligned cad models

Automated data capture

Selecting exterior images of a structure based on capture positions of indoor images associated with the structure

Frequently asked questions