What technology area does this patent fall under?

Primary CPC classification G06F16/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Natural language search over security videos

US12596747B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12596747-B2
Application number	US-202418814322-A
Country	US
Kind code	B2
Filing date	Aug 23, 2024
Priority date	Aug 25, 2023
Publication date	Apr 7, 2026
Grant date	Apr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may be configured to provide natural language search over security videos. In some aspects, the system may generate a first representation of sampled video information in a multidimensional format via a first machine learning model, and receive a request including a natural language input. Further, the system may generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model, and determine that the first representation has a predefined relationship with the second representation. In addition, the system may present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 2 . The method of claim 1 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 3 . The method of claim 1 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 4 . The method of claim 1 , further comprising: receiving video capture information from a video capture device; and sampling the video capture information to generate the sampled video information. 5 . The method of claim 1 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 6 . The method of claim 1 , wherein the first machine learning model is a convolutional neural network. 7 . The method of claim 1 , further comprising jointly training the first machine learning model and the second machine learning model using a common process. 8 . A system comprising: at least one memory storing instructions thereon; and at least one processor coupled to the at least one memory and configured by the instructions to: generate a first representation of sampled video information in a multidimensional format via a first machine learning model; receive a request including a natural language input; generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determine that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 9 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive a search query for a plurality of video frames corresponding an event defined by the natural language input. 10 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive the request for an alert identifying an occurrence of an event corresponding to the natural language input. 11 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive video capture information from a video capture device, and sample the video capture information to generate the sampled video information. 12 . The system of claim 8 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 13 . The system of claim 8 , wherein the first machine learning model is a convolutional neural network. 14 . A non-transitory computer-readable device having instructions thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 15 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 16 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 17 . The method of claim 7 , wherein the common process comprises coordinated training to align the first representation and the second representation.

Assignees

Tyco Fire & Security Gmbh

Inventors

Classifications

G06V30/19093
Proximity measures, i.e. similarity or distance measures · CPC title
G06V10/82
using neural networks · CPC title
G06V20/44
Event detection · CPC title
G06F16/73Primary
Querying · CPC title
G06F16/7837Primary
using objects detected or recognised in the video content · CPC title

Patent family

Related publications grouped by family.

View patent family 92712426

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12596747B2 cover?: A system may be configured to provide natural language search over security videos. In some aspects, the system may generate a first representation of sampled video information in a multidimensional format via a first machine learning model, and receive a request including a natural language input. Further, the system may generate a second representation of the natural language input in the mul…
Who is the assignee on this patent?: Tyco Fire & Security Gmbh
What technology area does this patent fall under?: Primary CPC classification G06F16/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Training data synthesis for machine learning

Systems and methods for retrieving videos using natural language description

Spatio-temporal action and actor localization

Automatic event detection, text generation, and use thereof

Spatio-temporal action and actor localization

Frequently asked questions