Aligning input image data with model input data to generate image annotations

US11514648B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11514648-B2
Application numberUS-202017133493-A
CountryUS
Kind codeB2
Filing dateDec 23, 2020
Priority dateDec 23, 2020
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is suitable for reliably training a DNN object detection architecture. Exemplary object annotations that the image data annotation system can automatically apply to individual image frames include, inter alia, object pose, image pose, object masks, 3D bounding boxes composited over the physical object, 2D bounding boxes composited over the physical object, and/or depth map information. Annotating the individual image frames may be accomplished by aligning the 3D model of the physical object with a multi-view reconstruction of the physical object that is generated by inputting an image sequence into a Structure-from-Motion and/or Multi-view Stereo pipeline.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving image input data that includes a plurality of image frames that graphically represent multiple viewpoints of a physical object within a real-world environment; accessing model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generating a multi-view reconstruction of the physical object based on the multiple viewpoints of the physical object that are graphically represented within the plurality of image frames; receiving a first pose trajectory that corresponds to generation of the plurality of image frames that graphically represent the multiple viewpoints of the physical object within the real-world environment; receiving a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints of the physical object; determining a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generating, in association with individual image frames of the plurality of image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction, of the physical object, to the 3D model of the physical object; and generating, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 2. The computer-implemented method of claim 1 , wherein determining the scale corresponding to the multi-view reconstruction comprises aligning the first pose trajectory to the second pose trajectory. 3. The computer-implemented method of claim 1 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 4. The computer-implemented method of claim 1 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 5. The computer-implemented method of claim 1 , wherein the object annotations that represent the orientation or the location of the physical object within the real-world environment include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 6. The computer-implemented method of claim 1 , wherein the plurality of image frames are a plurality of depth images that define individual depth values for individual pixels of a pixel array. 7. A system comprising: one or more processing units; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the one or more processing units, cause the one or more processing units to: receive image input data that includes a sequence of image frames that represent a physical object, within a real-world environment, from multiple viewpoints; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the multiple viewpoints; receive a first pose trajectory that corresponds to generation of the sequence of image frames that represent the physical object from the multiple viewpoints; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the 3D model in association with individual image frames of the sequence of image frames; generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object, or an orientation of the physical object. 8. The system of claim 7 , wherein the multiple viewpoints of the physical object are graphically represented within the plurality of image frames, wherein the individual alignments are determined based on orientations of the multi-view reconstruction, of the physical object, associated with the individual image frames. 9. The system of claim 8 , wherein the computer-executable instructions further cause the one or more processing units to: identify the model input data that defines the 3D model of the physical object based on the multi-view reconstruction of the physical object. 10. The system of claim 8 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 11. The system of claim 7 , wherein the individual image frames of the sequence of image frames are depth images that define individual depth values for individual pixels of a pixel array. 12. The system of claim 7 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 13. The system of claim 7 , wherein the object annotations that represent the orientation or the location of the physical object include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 14. A computer-readable storage media having instructions stored thereupon which, when executed by a processor, cause a computing device to: receive image input data that includes image frames showing a physical object within a real-world environment; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the image input data; receive a first pose trajectory that corresponds to generation of the image frames; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, in association with individual ones of the image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction to the 3D model; and generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 15. The computer-readable storage media of claim 14 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on multiple viewpoints. 16. The c

Assignees

Inventors

Classifications

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Combinations of networks · CPC title

  • Aligning objects, relative positioning of parts · CPC title

  • Range image; Depth image; 3D point clouds · CPC title

  • using feature-based methods, e.g. the tracking of corners or segments · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11514648B2 cover?
An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is sui…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V20/64. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).