Systems, methods, and apparatus for digital composition and/or retrieval
US-2015178953-A1 · Jun 25, 2015 · US
US11763467B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11763467-B2 |
| Application number | US-201817255837-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 28, 2018 |
| Priority date | Sep 28, 2018 |
| Publication date | Sep 19, 2023 |
| Grant date | Sep 19, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A multi-camera architecture for detecting and tracking a ball in real-time. The multi-camera architecture includes network interface circuitry to receive a plurality of real-time videos taken from a plurality of high-resolution cameras. Each of the high-resolution cameras simultaneously captures a sports event, wherein each of the plurality of high-resolution cameras includes a viewpoint that covers an entire playing field where the sports event is played. The multi-camera architecture further includes one or more processors coupled to the network interface circuitry and one or more memory devices coupled to the one or more processors. The one or more memory devices includes instructions to determine the location of the ball for each frame of the plurality of real-time videos, which when executed by the one or more processors, cause the multi-camera architecture to simultaneously perform one of a detection scheme or a tracking scheme on a frame from each of the plurality of real-time videos to detect the ball used in the sports event and perform a multi-camera build to determine a location of the ball in 3D for the frame from each of the plurality of real-time videos using one of detection or tracking results for each of the cameras.
Opening claim text (preview).
What is claimed is: 1. A multi-camera architecture, comprising: network interface circuitry to receive a plurality of real-time videos taken from a plurality of high-resolution cameras, each of the high-resolution cameras simultaneously capturing a sports event, wherein each of the plurality of high-resolution cameras includes a viewpoint that covers an entire playing field where the sports event is played; one or more processors coupled to the network interface circuitry; one or more memory devices coupled to the one or more processors, the one or more memory devices including instructions to determine a location of a ball for each frame of the plurality of real-time videos, which when executed by the one or more processors, cause the multi-camera architecture to: simultaneously perform one of a detection scheme or a tracking scheme on a frame from each of the plurality of real-time videos to detect the ball used in the sports event; and perform a multi-camera build to determine the location of the ball in 3D (3-Dimensions) for the frame from each of the plurality of real-time videos using one of detection or tracking results for each camera; wherein the tracking scheme comprises instructions, that when executed by the one or more processors, cause the multi-camera architecture to perform tracking-by-detection when the ball was detected in a previous frame, wherein instructions to perform tracking-by-detection further comprise instructions to only perform detection on a single tile, the single tile being set using a ball center of the previous frame in which the ball was detected or tracked as a tile center for the single tile. 2. The multi-camera architecture of claim 1 , wherein the detection scheme comprises instructions, which when executed by the one or more processors, cause the multi-camera architecture to: retrieve the frame for each of the cameras; retrieve a background image from each of the cameras; remove the background image from the frame to obtain a foreground mask for each of the cameras; partition the foreground mask into tiles to obtain a partitioned foreground mask for each of the cameras; perform motion filtering on the partitioned foreground mask to obtain a motion filtered foreground mask for each of the cameras; perform detection of the ball for each tile in the frame of the motion filtered foreground mask that indicates motion is occurring for each of the cameras; and collect detection results from all of the tiles in the frame for each of the cameras. 3. The multi-camera architecture of claim 2 , wherein detection includes detection of the ball using one of YOLO (You Only Look Once), Faster RCNN (Faster Region-based Convolutional Neural Network), SSD (Single Shot MultiBox Detector), and any other object detection technique used to detect small objects. 4. The multi-camera architecture of claim 1 , wherein detection includes detection of the ball using one of YOLO (You Only Look Once), Faster RCNN (Faster Region-based Convolutional Neural Network), SSD (Single Shot MultiBox Detector), and any other object detection technique used to detect small objects. 5. The multi-camera architecture of claim 1 , wherein the multi-camera build comprises instructions, that when executed by the one or more processors, cause the multi-camera architecture to: perform a multi-camera cross validation, the multi-camera cross validation including instructions to sample the detection results from a set of cameras, wherein the set of cameras are selected using a random sampling method; and calculate a matching error along an epipolar line for the set of cameras randomly selected; when the matching error is equal to or greater than a predetermined threshold, a miss or false detection has occurred, wherein instructions further comprise to repeat the multi-camera cross validation instructions until the matching error is less than the predetermined threshold; and when the matching error is less than the predetermined threshold, multi-camera build instructions further comprise instructions to determine a 3D ball location using the sample cameras, re-project the 3D ball location onto each of the cameras, and determine a distance between a detected position of the ball and a re-projection position of the ball for each of the cameras, wherein if the distance is less than a pre-determined threshold, the results from the detection of the ball are correct using the set of cameras, wherein the multi-camera build instructions further comprise instructions to place the set of cameras on an inner list and apply bundle adjustment to get an optimized 3D ball location. 6. The multi-camera architecture of claim 5 , wherein the multi-camera build further comprises instructions to repeat all of the multi-camera build instructions N times to obtain an optimal result with minimal re-project error. 7. The multi-camera architecture of claim 1 , wherein when the multi-camera build is successful, the tracking scheme is used in the next frame of each of the videos to locate the ball; and wherein when the multi-camera build is unsuccessful, the detection scheme is used in the next frame of each of the videos to locate the ball. 8. The multi-camera architecture of claim 1 , wherein further instructions, which when executed by the one or more processors, cause the multi-camera architecture to: project the 3d ball location onto each of the results of the plurality of cameras in 2D (2-Dimensions); perform the detection around a projected position to obtain a more accurate location of the ball for the frame from each of the cameras; and continuously advance each of the plurality of real-time videos to a next frame to repeat the instructions to determine the location of the ball for the next frame until the plurality of real-time videos end. 9. The multi-camera architecture of claim 1 , wherein the plurality of high-resolution cameras comprises twelve (12) high-resolution cameras, wherein at least three (3) of the 12 high-resolution cameras capture every pixel in the entire playing field. 10. A method comprising: simultaneously performing one of a detection scheme or a tracking scheme on a frame from each of a plurality of videos captured from at least twelve high-resolution cameras to detect a ball used in a sports event; and performing a multi-camera build to determine a location of the ball in 3D (3-Dimensions) for the frame from each of the plurality of videos using one of detection or tracking results for each camera; wherein the tracking scheme comprises performing tracking-by-detection when the ball was detected in a previous frame, wherein tracking-by-detection comprises only performing detection on a single tile, the single tile being set using a ball center of the previous frame in which the ball was detected or tracked as a tile center for the single tile. 11. The method of claim 10 , wherein the detection scheme comprises: retrieving the frame for each of the cameras; retrieving a background image from each of the cameras; removing the background image from the frame to obtain a foreground mask for each of the cameras; partitioning the foreground mask into tiles to obtain a partitioned foreground mask for each of the cameras; performing motion filtering on the partitioned foreground mask to obtain a motion filtered foreground mask for each of the cameras; performing detection of the ball for each tile in the frame of the motion filtered foreground mask that indicates motion is occurring for each of the cameras; and collecting detection results from all of the tiles in the frame for each of the cameras. 12. The method of claim 10 , wherein the multi-camera build compri
Multi-camera tracking · CPC title
using feature-based methods · CPC title
Video; Image sequence · CPC title
Dividing image into blocks, subimages or windows · CPC title
Artificial neural networks [ANN] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.