Systems and methods for virtual and augmented reality
US-2021150252-A1 · May 20, 2021 · US
US12488597B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12488597-B2 |
| Application number | US-202117167570-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 4, 2021 |
| Priority date | Feb 4, 2021 |
| Publication date | Dec 2, 2025 |
| Grant date | Dec 2, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for semantic keypoint detection is described. The method includes linking, using a keypoint graph neural network (KGNN), semantic keypoints of an object within a first image of a video stream into a 2D graph structure corresponding to a category of the object. The method also includes embedding descriptors within the semantic keypoints of the 2D graph structure corresponding to the category of the object. The method further includes tracking the object within subsequent images of the video stream using the embedded descriptors within the semantic keypoints of the 2D graph structure corresponding to the category of the object.
Opening claim text (preview).
What is claimed is: 1 . A method for semantic keypoint detection, comprising: linking, using a first keypoint graph neural network (KGNN) encoder, semantic keypoints of an object within a first frame at time T of a video stream into a first 2D graph structure corresponding to a category of the object; embedding descriptors within the semantic keypoints of the first 2D graph structure corresponding to the category of the object; linking, using a second KGNN encoder, semantic keypoints in a second 2D graph structure representing the object category of the object in a second frame at time T+1 of the video stream; generating, using a shared differential keypoint flow model, an estimated second 2D graph structure corresponding to the category of the object in the second frame at time T+1 of the video stream according to the embedded descriptors in the semantic keypoints of the first 2D graph structure; and tracking the object within subsequent frames of the video stream according to a regression loss from comparing, using a shared matching layer, the estimated 2D graph structure according to the embedded descriptor in the semantic keypoints of the first 2D graph structure with the second 2D graph structure of the object in the second frame at time T+1. 2 . The method of claim 1 , in which linking the semantic keypoints comprises: linking, using the first KGNN encoder, interest keypoints of the object within the first image of the video stream into the 2D graph structure corresponding to the category of the object within the first frame of the video stream; and detecting, using a first KGNN detector, the semantic keypoints from the linked, interest keypoint within the first 2D graph structure corresponding to the category of the object within the first frame at time T of the video stream. 3 . The method of claim 1 , in which the first 2D graph structure and the second 2D graph structure are based on a geometric structure of the category associated with the object. 4 . The method of claim 1 , in which linking comprises: extracting, using a shared image backbone, interest keypoints within the first frame of the video stream based on relevant appearance and geometric features of the first frame; and generating a keypoint heatmap based on the extracted interest keypoints. 5 . The method of claim 1 , in which embedding comprises: generating descriptors of the semantic keypoints; and embedding, using a first KGNN descriptor head, the generated descriptors within the semantic keypoints of the first 2D graph structure. 6 . The method of claim 1 , in which the object comprises a vehicle represented by the estimated 2D graph structure to depict geometry/spatial relationships of a rigid-body of the vehicle according to the category of the vehicle. 7 . A non-transitory computer-readable medium having program code recorded thereon for semantic keypoint detection, the program code being executed by a processor and comprising: program code to link, using a first keypoint graph neural network (KGNN) encoder, semantic keypoints of an object within a first frame at time T of a video stream into a first 2D graph structure corresponding to a category of the object; program code to embed descriptors within the semantic keypoints of the first 2D graph structure corresponding to the category of the object; linking, using a second KGNN encoder, semantic keypoints in a second 2D graph structure representing the object category of the object in a second frame at time T+1 of the video stream; program code to generate, using a shared differential keypoint flow model, an estimated second 2D graph structure corresponding to the category of the object in the second frame at time T+1 of the video stream according to the embedded descriptors in the semantic keypoints of the first 2D graph structure; and program code to track the object within subsequent frames of the video stream according to a regression loss from comparing, using a shared matching layer, the estimated 2D graph structure according to the embedded descriptor in the semantic keypoints of the first 2D graph structure with the second 2D graph structure of the object in the second frame at time T+1. 8 . The non-transitory computer-readable medium of claim 7 , in which linking the semantic keypoints comprises: program code to link, using the first KGNN encoder, interest keypoints of the object within the first frame of the video stream into the first 2D graph structure corresponding to the category of the object within the first frame of the video stream; and program code to detect, using a first KGNN detector, the semantic keypoints from the linked, interest keypoint within the first 2D graph structure corresponding to the category of the object within the first frame of the video stream. 9 . The non-transitory computer-readable medium of claim 7 , in which the first 2D graph structure and the second 2D graph structure are based on a geometric structure of the category associated with the object. 10 . The non-transitory computer-readable medium of claim 7 , in which the program code to link comprises: program code to extract, using a shared image backbone, interest keypoints within the first frame of the video stream based on relevant appearance and geometric features of the first frame; and program code to generate a keypoint heatmap based on the extracted interest keypoints. 11 . The non-transitory computer-readable medium of claim 7 , in which the program code to embed comprises: program code to generate the descriptors of the semantic keypoints; and program code to embed, using a first KGNN descriptor head, the descriptors within the semantic keypoints of the first 2D graph structure. 12 . The non-transitory computer-readable medium of claim 7 , in which the object comprises a vehicle represented by the estimated 2D graph structure to depict geometry/spatial relationships of a rigid-body of the vehicle according to the category of the vehicle. 13 . A system for semantic keypoint detection, the system comprising: a semantic keypoint detection module to link, using a first keypoint graph neural network (KGNN) encoder, semantic keypoints of an object within a first frame at time T of a video stream into a first 2D graph structure corresponding to a category of the object, and to link, using a second KGNN encoder, semantic keypoints in a second 2D graph structure representing the object category of the object in a second frame at time T+1 of the video stream; a semantic keypoint descriptor module to embed descriptors within the semantic keypoints of the first 2D graph structure corresponding to the category of the object, and to generate, using a shared differential keypoint flow model, an estimated second 2D graph structure corresponding to the category of the object in the second frame at time T+1 of the video stream according to the embedded descriptors-within in the semantic keypoints of the first 2D graph structure; and a semantic keypoint tracking module to track the object within subsequent frames of the video stream according to a regression loss from comparing, using a shared matching layer, the estimated 2D graph structure according to the embedded descriptor in the semantic keypoints of the first 2D graph structure with the second 2D graph structure of the object in the second frame at time T+1. 14 . The system of claim 13 , in which the object comprises a vehicle represented by the estimated 2D graph structure to depict geometry/spatial relationships of a rigid-body of the vehicle according to the category of the vehicle.
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
considering possible movement changes · CPC title
Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title
Architecture, e.g. interconnection topology · CPC title
Spatial relation or speed relative to objects · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.