Who is the assignee on this patent?

Getac Technology Corp, Whp Workflow Solutions Inc

What technology area does this patent fall under?

Primary CPC classification G06V10/62. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for tracking objects in videos using machine-learning models

US12354306B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12354306-B2
Application number	US-202217881423-A
Country	US
Kind code	B2
Filing date	Aug 4, 2022
Priority date	Aug 4, 2022
Publication date	Jul 8, 2025
Grant date	Jul 8, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video file may be presented via a user application that displays one or more video frames of the video file. A user request to perform an object detection for objects of a specific object type in a video frame of the video file may be received from the user application. A machine-learning model of a plurality of machine-learning models that is configured to detect objects of the specific object type may be applied to the video frame to detect an object of the specific object type in the video frame. Each of the plurality of machine-learning models may be trained to detect objects of a corresponding object type. Subsequently, an object tracking algorithm may be applied to one or more additional video frames of the video file to track the object of the specific object type across the one or more additional video frames.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more non-transitory computer-readable media storing computer-executable instructions that, upon execution, cause one or more processors to perform operations for tracking objects in videos, the operations comprising: presenting a video file via a user application, the video file including a plurality of video frames; receiving, from the user application, a user request for detecting an object of an object type in the video file; detecting the object of the object type in a first one of the plurality of video frames by one of a plurality of machine-learning models, the one of the plurality of machine-learning models being configured to detect a plurality of objects of the object type, wherein the plurality of machine-learning models are trained to detect a plurality of objects of a plurality of object types, and the plurality of object types include the object type; and tracking the object of the object type on a second one of the plurality of video frames of the video file by an object tracking algorithm, the object tracking algorithm including a target representation and localization algorithm or a filtering and data association algorithm. 2. The one or more non-transitory computer-readable media of claim 1 , wherein the operations further comprise performing a data processing operation with respect to the object of the object type as captured in the video file. 3. The one or more non-transitory computer-readable media of claim 2 , wherein the data processing operation includes redacting the object from one of the plurality of video frames of the video file. 4. The one or more non-transitory computer-readable media of claim 1 , wherein the object is a first object, the object type is a first object type, and the operations further comprise: receiving an object type correction for a second object in the first video frame, the second object being detected by the one of the plurality of machine-learning models, the object type correction indicating that the second object is of a second object type; and tracking the second object of the second object type on a third one of the plurality of video frames of the video file. 5. The one or more non-transitory computer-readable media of claim 4 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: storing information indicating that the second object is of the second object type; and incorporating the information into at least one of a first set of training data for training the first machine-learning model or a second set of training data for training a second one of the plurality of machine-learning models, the second machine-learning model being configured to detect a plurality of objects of the second object type. 6. The one or more non-transitory computer-readable media of claim 1 , wherein the video file is a first video file, and the operations further comprise: receiving an indication that an object of interest is undetectable by the plurality of machine-learning models; storing an image of the object of interest and the indication in a data store for a review; receiving training data including the image of the object of interest labeled as an object type of interest; and training a machine-learning model based on the training data to detect a plurality of objects of the object type of interest in at least one of the first video file or a second video file. 7. The one or more non-transitory computer-readable media of claim 6 , wherein the operations further comprise: determining, before receiving the training data, a number of a plurality of images of the object of interest received during a predetermined time period exceeds a numerical threshold. 8. The one or more non-transitory computer-readable media of claim 1 , wherein the user request is a first user request, the object is a first object, the object type is a first object type, the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: receiving, from the user application, a second user request for detecting a second object of a second one of the plurality of object types in the video file; detecting the second object of the second object type in a third one of the plurality of video frames by a second one of the plurality of machine-learning models, the second machine-learning model being configured to detect a plurality of objects of the second object type; and tracking the second object of the second object type on a fourth of the plurality of video frames of the video file by the object tracking algorithm. 9. The one or more non-transitory computer-readable media of claim 1 , wherein the plurality of machine-learning models are trained to detect the plurality of objects of the plurality of object types, respectively. 10. The one or more non-transitory computer-readable media of claim 1 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the plurality of machine-learning models include a second machine-learning model and a third machine-learning model trained to detect a plurality of objects of one of the plurality of object types with different object sizes. 11. A system for tracking objects in videos, the system comprising: one or more processors; and one or more memories including a plurality of computer-executable instructions that are executable by the one or more processors to perform a plurality of operations for tracking objects in videos, the operations comprising: presenting a video file via a user application, the video file including a plurality of video frames; receiving, from the user application, a user request for detecting an object of an object type in the video file; detecting the object of the object type in a first one of the plurality of video frames by one of a plurality of machine-learning models, the one of the plurality of machine-learning models being configured to detect a plurality of objects of the object type, wherein the plurality of machine-learning models are trained to detect a plurality of objects of a plurality of object types, and the plurality of object types include the object type; and tracking the object of the object type on a second one of the plurality of video frames of the video file by an object tracking algorithm, the object tracking algorithm including a target representation and localization algorithm or a filtering and data association algorithm. 12. The system of claim 11 , wherein the operations further comprise redacting the object from one of the plurality of video frames of the video file. 13. The system of claim 11 , wherein the object is a first object, the object type is a first object type, and the operations further comprise: receiving an object type correction for a second object in the first video frame, the second object detected by the one of the plurality of machine-learning models, the object type correction indicating that the second object is of a second object type; and tracking the second object of the second object type on a third one of the plurality of video frames of the video file. 14. The system of claim 13 , wherein the one of the plurality of machine-learning models is a first machine-learning model, and the operations further comprise: storing information indicating that the second object is of the second object type; and incorporating the information into at least one of a first set of training data for training the first machine-learning model or a second set of training data for training a second o

Assignees

Inventors

Classifications

G06T2207/10016
Video; Image sequence · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title
G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V10/98
Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns · CPC title

Patent family

Related publications grouped by family.

View patent family 89769291

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12354306B2 cover?: A video file may be presented via a user application that displays one or more video frames of the video file. A user request to perform an object detection for objects of a specific object type in a video frame of the video file may be received from the user application. A machine-learning model of a plurality of machine-learning models that is configured to detect objects of the specific obje…
Who is the assignee on this patent?: Getac Technology Corp, Whp Workflow Solutions Inc
What technology area does this patent fall under?: Primary CPC classification G06V10/62. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 08 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Video content processing using selected machine-learning models

Selective redaction of images

Collaborative object detection

Selective redaction of images

Machine learning framework applied in a semi-supervised setting to perform instance tracking in a sequence of image frames

System and method for evaluating images to support multiple risk applications

Video Triggered Analyses

Frequently asked questions