Outputting warped images from captured video data
US-2021366075-A1 · Nov 25, 2021 · US
US11514648B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11514648-B2 |
| Application number | US-202017133493-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 23, 2020 |
| Priority date | Dec 23, 2020 |
| Publication date | Nov 29, 2022 |
| Grant date | Nov 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An image data annotation system automatically annotates a physical object within individual images frames of an image sequence with relevant object annotations based on a three-dimensional (3D) model of the physical object. Annotating the individual image frames with object annotations includes updating individual image frames within image input data to generate annotated image data that is suitable for reliably training a DNN object detection architecture. Exemplary object annotations that the image data annotation system can automatically apply to individual image frames include, inter alia, object pose, image pose, object masks, 3D bounding boxes composited over the physical object, 2D bounding boxes composited over the physical object, and/or depth map information. Annotating the individual image frames may be accomplished by aligning the 3D model of the physical object with a multi-view reconstruction of the physical object that is generated by inputting an image sequence into a Structure-from-Motion and/or Multi-view Stereo pipeline.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving image input data that includes a plurality of image frames that graphically represent multiple viewpoints of a physical object within a real-world environment; accessing model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generating a multi-view reconstruction of the physical object based on the multiple viewpoints of the physical object that are graphically represented within the plurality of image frames; receiving a first pose trajectory that corresponds to generation of the plurality of image frames that graphically represent the multiple viewpoints of the physical object within the real-world environment; receiving a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints of the physical object; determining a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generating, in association with individual image frames of the plurality of image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction, of the physical object, to the 3D model of the physical object; and generating, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 2. The computer-implemented method of claim 1 , wherein determining the scale corresponding to the multi-view reconstruction comprises aligning the first pose trajectory to the second pose trajectory. 3. The computer-implemented method of claim 1 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 4. The computer-implemented method of claim 1 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 5. The computer-implemented method of claim 1 , wherein the object annotations that represent the orientation or the location of the physical object within the real-world environment include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 6. The computer-implemented method of claim 1 , wherein the plurality of image frames are a plurality of depth images that define individual depth values for individual pixels of a pixel array. 7. A system comprising: one or more processing units; and a computer-readable storage medium having computer-executable instructions stored thereupon, which, when executed by the one or more processing units, cause the one or more processing units to: receive image input data that includes a sequence of image frames that represent a physical object, within a real-world environment, from multiple viewpoints; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the multiple viewpoints; receive a first pose trajectory that corresponds to generation of the sequence of image frames that represent the physical object from the multiple viewpoints; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction based on the multiple viewpoints; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the 3D model in association with individual image frames of the sequence of image frames; generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object, or an orientation of the physical object. 8. The system of claim 7 , wherein the multiple viewpoints of the physical object are graphically represented within the plurality of image frames, wherein the individual alignments are determined based on orientations of the multi-view reconstruction, of the physical object, associated with the individual image frames. 9. The system of claim 8 , wherein the computer-executable instructions further cause the one or more processing units to: identify the model input data that defines the 3D model of the physical object based on the multi-view reconstruction of the physical object. 10. The system of claim 8 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on the multiple viewpoints. 11. The system of claim 7 , wherein the individual image frames of the sequence of image frames are depth images that define individual depth values for individual pixels of a pixel array. 12. The system of claim 7 , wherein the annotated image data is formatted in accordance with a predefined training data format corresponding to one or more artificial neural network models. 13. The system of claim 7 , wherein the object annotations that represent the orientation or the location of the physical object include at least one of: individual object poses of the physical object within the individual images; object masks within the individual images; or individual 3D bounding boxes corresponding to the physical object within the individual images. 14. A computer-readable storage media having instructions stored thereupon which, when executed by a processor, cause a computing device to: receive image input data that includes image frames showing a physical object within a real-world environment; access model input data that defines a three-dimensional (3D) model of the physical object, the model input data being separate from the image input data; generate a multi-view reconstruction of the physical object based on the image input data; receive a first pose trajectory that corresponds to generation of the image frames; receive a second pose trajectory that corresponds to generation of the multi-view reconstruction; determine a scale corresponding to the multi-view reconstruction based on the first pose trajectory and the second pose trajectory; generate, in association with individual ones of the image frames and based on the scale corresponding to the multi-view reconstruction, alignment data that defines individual alignments of the multi-view reconstruction to the 3D model; and generate, based on the alignment data, annotated image data by updating the individual image frames to include object annotations that represent at least one of: a location of the physical object within the real-world environment, or an orientation of the physical object within the real-world environment. 15. The computer-readable storage media of claim 14 , wherein the generating the multi-view reconstruction includes generating a point cloud representation of the physical object based on multiple viewpoints. 16. The c
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Combinations of networks · CPC title
Aligning objects, relative positioning of parts · CPC title
Range image; Depth image; 3D point clouds · CPC title
using feature-based methods, e.g. the tracking of corners or segments · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.