Systems and methods for generating improved content based on matching mappings
US-2021263964-A1 · Aug 26, 2021 · US
US2021374419A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021374419-A1 |
| Application number | US-202117303365-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 27, 2021 |
| Priority date | Jun 2, 2020 |
| Publication date | Dec 2, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system of generating agent and actions prediction based on multi-agent tracking data are disclosed herein. A computing system retrieves tracking data from a data store. The computing system generates a trained neural network by generating a plurality of training data sets based on the tracking data by converting each frame of data into a matrix representation of the data contained in the frame and learning, by the neural network, a start frame and end frame of each action contained in the frame and its associated actor. The computing system receives target tracking data associated with an event. The target tracking data includes a plurality of actors and a plurality of actions. The computing system generates, via the trained neural network, a target start frame and a target end frame of each action identified in the tracking data and a corresponding actor.
Opening claim text (preview).
1 . A method of generating agent and actions prediction based on multi-agent tracking data, comprising: retrieving, by a computing system, tracking data from a data store, the tracking data comprising a plurality of frames of data for a plurality of events across a plurality of seasons; generating, by the computing system, a trained neural network, by: generating a plurality of training data sets based on the tracking data by converting each frame of data into a matrix representation of the data contained in the frame; and learning, by the neural network, a start frame and end frame of each action contained in the frame and its associated actor; receiving, by the computing system, target tracking data associated with an event, the target tracking data comprising a plurality of actors and a plurality of actions; converting, by the computing system, the tracking data into a matrix representation of the tracking data; generating, by the computing system via the trained neural network, a target start frame and a target end frame of each action identified in the tracking data and a corresponding actor; and presenting, by the computing system, the predicted target start frame, predicted target end frame, and predicted associated actor to one or more end users. 2 . The method of claim 1 , wherein the neural network comprises: a spatial attention sub-network; a per-agent convolution network comprising parallel convolutional streams; and a mutual attention sub-network comprising a first multilayer perceptron and a second multilayer perceptron. 3 . The method of claim 2 , wherein generating, by the computing system via the trained neural network, the target start frame and the target end frame of each action identified in the tracking data and a corresponding actor; comprises: inputting the matrix representation of the tracking data into the spatial attention sub-network to generate spatial attention coefficients. 4 . The method of claim 3 , further comprising: generating a weighted spatiotemporal matrix by multiplying the matrix representation of the tracking data by the spatial attention coefficients; and inputting the weighted spatiotemporal matrix into a first convolutional stream of the parallel convolutional stream and a second convolutional stream of the parallel convolutional stream. 5 . The method of claim 4 , further comprising: combining, by the mutual attention sub-network, a first output from the first convolutional stream and a second output from the second convolutional stream to generate a mutual attention layer; and passing the mutual attention layer through the first multilayer perceptron to generate an action prediction; and passing the mutual attention layer through the second multilayer perceptron to generate an actor prediction. 6 . The method of claim 1 , wherein each action may comprise a plurality of sub-actions. 7 . The method of claim 6 , wherein learning, by the neural network, the start frame and the end frame of each action contained in the frame and its associated actor, comprises: initializing a sub-action to action-actor mapping function that is used as a ground truth, frame level; and optimizing the neural network by reducing cross entropy between a predicted sub-action, a predicted actor, and the ground truth, frame level. 8 . The method of claim 1 , wherein the target tracking data comprises raw positional data for the plurality of actors. 9 . The method of claim 8 , further comprising: annotating, by the computing system, the raw positional data for the plurality of actors with player position information. 10 . The method of claim 8 , further comprising: fusing, by the computing system, the raw positional data for the plurality of actors with manually annotated tracking data. 11 . The method of claim 1 , further comprising: inputting, by the computing system, the predicted target start frame, the predicted target end frame, and the predicted associated actor into a refinement module; and generating, by the computing system via the refinement module, a refined predicted target start frame, a refined predicted target end frame, and a refined predicted associated actor. 12 . A system for generating agent and actions prediction based on multi-agent tracking data, comprising: a processor; and a memory having programming instructions stored thereon, which, when executed by the processor, performs one or more operations, comprising: retrieving tracking data from a data store, the tracking data comprising a plurality of frames of data for a plurality of events across a plurality of seasons; generating a trained neural network, by: generating a plurality of training data sets based on the tracking data by converting each frame of data into a matrix representation of the data contained in the frame; and learning, by the neural network, a start frame and end frame of each action contained in the frame and its associated actor; receiving target tracking data associated with an event, the target tracking data comprising a plurality of actors and a plurality of actions; converting the tracking data into a matrix representation of the tracking data; generating, via the trained neural network, a target start frame and a target end frame of each action identified in the tracking data and a corresponding actor; and presenting the predicted target start frame, the predicted target end frame, and the predicted associated actor to one or more end users. 13 . The system of claim 12 , wherein the neural network comprises: a spatial attention sub-network; a per-agent convolution network comprising parallel convolutional streams; and a mutual attention sub-network comprising a first multilayer perceptron and a second multilayer perceptron. 14 . The system of claim 13 , wherein generating, via the trained neural network, the target start frame and the target end frame of each action identified in the tracking data and the corresponding actor; comprises: inputting the matrix representation of the tracking data into the spatial attention sub-network to generate spatial attention coefficients. 15 . The system of claim 14 , further comprising: generating a weighted spatiotemporal matrix by multiplying the matrix representation of the tracking data by the spatial attention coefficients; and inputting the weighted spatiotemporal matrix into a first convolutional stream of the parallel convolutional stream and a second convolutional stream of the parallel convolutional stream. 16 . The system of claim 15 , further comprising: combining, by the mutual attention sub-network, a first output from the first convolutional stream and a second output from the second convolutional stream to generate a mutual attention layer; and passing the mutual attention layer through the first multilayer perceptron to generate an action prediction; and passing the mutual attention layer through the second multilayer perceptron to generate an actor prediction. 17 . The system of claim 12 , wherein each action may comprise a plurality of sub-actions. 18 . The system of claim 17 , wherein learning, by the neural network, the start frame and the end frame of each action contained in the frame and its associated actor, comprises: initializing a sub-action to action-actor mapping function that is used as a ground truth, frame level; and optimizing the neural network by reducing cross entropy between a predicted sub-action, a predicted actor, and the ground truth, frame level.
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
using neural networks · CPC title
of sport video content · CPC title
Validation; Performance evaluation; Active pattern learning techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.