System and method for calibrating moving camera capturing broadcast video
US-2020279398-A1 · Sep 3, 2020 · US
US12579602B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12579602-B2 |
| Application number | US-202318513966-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 20, 2023 |
| Priority date | Apr 10, 2020 |
| Publication date | Mar 17, 2026 |
| Grant date | Mar 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system and method of calibrating a broadcast video feed are disclosed herein. A computing system retrieves a plurality of broadcast video feeds that include a plurality of video frames. The computing system generates a trained neural network, by generating a plurality of training data sets based on the broadcast video feed and learning, by the neural network, to generate a homography matrix for each frame of the plurality of frames. The computing system receives a target broadcast video feed for a target sporting event. The computing system partitions the target broadcast video feed into a plurality of target frames. The computing system generates for each target frame in the plurality of target frames, via the neural network, a target homography matrix. The computing system calibrates the target broadcast video feed by warping each target frame by a respective target homography matrix.
Opening claim text (preview).
The invention claimed is: 1 . A method of calibrating a broadcast video feed, comprising: receiving, by a computing system, a broadcast video feed for a sporting event, the broadcast video feed including a plurality of frames; inputting, by the computing system, the plurality of frames into a trained neural network, wherein the trained neural network is configured to generate a corresponding homography matrix for each of the plurality of frames; calibrating, by the computing system, the broadcast video feed, wherein calibrating the broadcast video feed includes: warping each of the plurality of frames based on the corresponding homography matrix to a high occupancy perspective and a low occupancy perspective, wherein warping each of the plurality of frames includes generating one or more images and one or more semantic labels; and computing a loss function between the plurality of warped frames, wherein computing the loss function includes weighting the plurality of warped frames according to the high occupancy perspective and the low occupancy perspective; and utilizing, by the computing system, the one or more images, the one or more semantic labels, and the loss function to further train the trained neural network. 2 . The method of claim 1 , wherein the trained neural network comprises: a semantic segmentation module; a camera pose initialization module; and a homography refinement module. 3 . The method of claim 2 , wherein the semantic segmentation module is configured to generate a semantic map. 4 . The method of claim 3 , wherein the camera pose initialization module is configured to select a template from a set of templates using the semantic map to generate the homography matrix. 5 . The method of claim 4 , wherein the homography refinement module is configured to generate the homography matrix based on the template and the semantic map. 6 . The method of claim 4 , wherein the camera pose initialization module utilizes a Siamese network to select the template from the set of templates. 7 . The method of claim 1 , wherein the homography matrix registers a target ground-plane surface of any of the plurality of frames with a top view field model. 8 . The method of claim 1 , wherein the broadcast video includes an overhead view model, and wherein the overhead view model includes projected one or more three-dimensional locations of at least one of: one or more players or a ball onto a two-dimensional overhead view of a court of the sporting event. 9 . The method of claim 8 , wherein the warping each of the plurality of frames includes warping the overhead view model with the homography matrix. 10 . A system for calibrating a broadcast video feed, comprising: a processor; and a memory having programming instructions stored thereon, which, when executed by the processor, performs one or more operations, comprising: receiving a broadcast video feed for a sporting event, the broadcast video feed including a plurality of frames; inputting the plurality of frames into a trained neural network, wherein the trained neural network is configured to generate a corresponding homography matrix for each of the plurality of frames; calibrating the broadcast video feed, wherein calibrating the broadcast video feed includes: warping each of the plurality of frames based on the corresponding homography matrix to a high occupancy perspective and a low occupancy perspective, wherein warping each of the plurality of frames includes generating one or more images and one or more semantic labels; and computing a loss function between the plurality of warped frames, wherein computing the loss function includes weighting the plurality of warped frames according to the high occupancy perspective and the low occupancy perspective; and utilizing the one or more images, the one or more semantic labels, and the loss function to further train the trained neural network. 11 . The system of claim 10 , wherein the trained neural network comprises: a semantic segmentation module; a camera pose initialization module; and a homography refinement module. 12 . The system of claim 11 , wherein the semantic segmentation module is configured to generate a semantic map. 13 . The system of claim 12 , wherein the camera pose initialization module is configured to select a template from a set of templates using the semantic map to generate the homography matrix. 14 . The system of claim 13 , wherein the camera pose initialization module utilizes a Siamese network to select the template from the set of templates. 15 . The system of claim 10 , wherein the homography matrix registers a target ground-plane surface of any of the plurality of frames with a top view field model. 16 . The system of claim 10 , wherein the broadcast video includes an overhead view model, and wherein the overhead view model includes projected one or more three-dimensional locations of at least one of: one or more players or a ball onto a two-dimensional overhead view of a court of the sporting event. 17 . The system of claim 16 , wherein the warping each of the plurality of frames includes warping the overhead view model with the homography matrix. 18 . A non-transitory computer readable medium including one or more sequences of instructions that, when executed by one or more processors, causes: receiving, by a computing system, a broadcast video feed for a sporting event, the broadcast video feed including a plurality of frames; inputting, by the computing system, the plurality of frames into a trained neural network, wherein the trained neural network is configured to generate a corresponding homography matrix for each of the plurality of frames; calibrating, by the computing system, the broadcast video feed, wherein calibrating the broadcast video feed includes: warping each of the plurality of frames based on the corresponding homography matrix to a high occupancy perspective and a low occupancy perspective, wherein warping each of the plurality of frames includes generating one or more images and one or more semantic labels; and computing a loss function between the plurality of warped frames, wherein computing the loss function includes weighting the plurality of warped frames according to the high occupancy perspective and the low occupancy perspective; and utilizing, by the computing system, the one or more images, the one or more semantic labels, and the loss function to further train the trained neural network. 19 . The non-transitory computer readable medium of claim 18 , wherein the homography matrix registers a target ground-plane surface of any of the plurality of frames with a top view field model. 20 . The non-transitory computer readable medium of claim 18 , wherein the broadcast video includes an overhead view model, and wherein the overhead view model includes projected one or more three-dimensional locations of at least one of: one or more players or a ball onto a two-dimensional overhead view of a court of the sporting event.
Image warping, e.g. rearranging pixels individually · CPC title
Aligning, centring, orientation detection or correction of the image · CPC title
using neural networks · CPC title
Classification techniques · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.