Video content processing using selected machine-learning models
US-2024046515-A1 · Feb 8, 2024 · US
US12354306B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12354306-B2 |
| Application number | US-202217881423-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 4, 2022 |
| Priority date | Aug 4, 2022 |
| Publication date | Jul 8, 2025 |
| Grant date | Jul 8, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A video file may be presented via a user application that displays one or more video frames of the video file. A user request to perform an object detection for objects of a specific object type in a video frame of the video file may be received from the user application. A machine-learning model of a plurality of machine-learning models that is configured to detect objects of the specific object type may be applied to the video frame to detect an object of the specific object type in the video frame. Each of the plurality of machine-learning models may be trained to detect objects of a corresponding object type. Subsequently, an object tracking algorithm may be applied to one or more additional video frames of the video file to track the object of the specific object type across the one or more additional video frames.
Opening claim text (preview).
What is claimed is: 1. One or more non-transitory computer-readable media storing computer-executable instructions that, upon execution, cause one or more processors to perform operations for tracking objects in videos, the operations comprising: presenting a video file via a user application, the video file including a plurality of video frames; receiving, from the user application, a user request for detecting an object of an object type in the video file; detecting the object of the object type in a first one of the plurality of video frames by one of a plurality of machine-learning models, the one of the plurality of machine-learning models being configured to detect a plurality of objects of the object type, wherein the plurality of machine-learning models are trained to detect a plurality of objects of a plurality of object types, and the plurality of object types include the object type; and tracking the object of the object type on a second one of the plurality of video frames of the video file by an object tracking algorithm, the object tracking algorithm including a target representation and localization algorithm or a filtering and data association algorithm. 2. The one or more non-transitory computer-readable media of claim 1 , wherein the operations further comprise performing a data processing operation with respect to the object of the object type as captured in the video file. 3. The one or more non-transitory computer-readable media of claim 2 , wherein the data processing operation includes redacting the object from one of the plurality of video frames of the video file. 4. The one or more non-transitory computer-readable media of claim 1 , wherein the object is a first object, the object type is a first object type, and the operations further comprise: receiving an object type correction for a second object in the first video frame, the second object being detected by the one of the plurality of machine-learning models, the object type correction indicating that the second object is of a second object type; and tracking the second object of the second object type on a third one of the plurality of video frames of the video file. 5. The one or more non-transitory computer-readable media of claim 4 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: storing information indicating that the second object is of the second object type; and incorporating the information into at least one of a first set of training data for training the first machine-learning model or a second set of training data for training a second one of the plurality of machine-learning models, the second machine-learning model being configured to detect a plurality of objects of the second object type. 6. The one or more non-transitory computer-readable media of claim 1 , wherein the video file is a first video file, and the operations further comprise: receiving an indication that an object of interest is undetectable by the plurality of machine-learning models; storing an image of the object of interest and the indication in a data store for a review; receiving training data including the image of the object of interest labeled as an object type of interest; and training a machine-learning model based on the training data to detect a plurality of objects of the object type of interest in at least one of the first video file or a second video file. 7. The one or more non-transitory computer-readable media of claim 6 , wherein the operations further comprise: determining, before receiving the training data, a number of a plurality of images of the object of interest received during a predetermined time period exceeds a numerical threshold. 8. The one or more non-transitory computer-readable media of claim 1 , wherein the user request is a first user request, the object is a first object, the object type is a first object type, the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: receiving, from the user application, a second user request for detecting a second object of a second one of the plurality of object types in the video file; detecting the second object of the second object type in a third one of the plurality of video frames by a second one of the plurality of machine-learning models, the second machine-learning model being configured to detect a plurality of objects of the second object type; and tracking the second object of the second object type on a fourth of the plurality of video frames of the video file by the object tracking algorithm. 9. The one or more non-transitory computer-readable media of claim 1 , wherein the plurality of machine-learning models are trained to detect the plurality of objects of the plurality of object types, respectively. 10. The one or more non-transitory computer-readable media of claim 1 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the plurality of machine-learning models include a second machine-learning model and a third machine-learning model trained to detect a plurality of objects of one of the plurality of object types with different object sizes. 11. A system for tracking objects in videos, the system comprising: one or more processors; and one or more memories including a plurality of computer-executable instructions that are executable by the one or more processors to perform a plurality of operations for tracking objects in videos, the operations comprising: presenting a video file via a user application, the video file including a plurality of video frames; receiving, from the user application, a user request for detecting an object of an object type in the video file; detecting the object of the object type in a first one of the plurality of video frames by one of a plurality of machine-learning models, the one of the plurality of machine-learning models being configured to detect a plurality of objects of the object type, wherein the plurality of machine-learning models are trained to detect a plurality of objects of a plurality of object types, and the plurality of object types include the object type; and tracking the object of the object type on a second one of the plurality of video frames of the video file by an object tracking algorithm, the object tracking algorithm including a target representation and localization algorithm or a filtering and data association algorithm. 12. The system of claim 11 , wherein the operations further comprise redacting the object from one of the plurality of video frames of the video file. 13. The system of claim 11 , wherein the object is a first object, the object type is a first object type, and the operations further comprise: receiving an object type correction for a second object in the first video frame, the second object detected by the one of the plurality of machine-learning models, the object type correction indicating that the second object is of a second object type; and tracking the second object of the second object type on a third one of the plurality of video frames of the video file. 14. The system of claim 13 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: storing information indicating that the second object is of the second object type; and incorporating the information into at least one of a first set of training data for training the first machine-learning model or a second set of training data for training a second o
Video; Image sequence · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.