Natural language search over security videos

US12596747B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12596747-B2
Application numberUS-202418814322-A
CountryUS
Kind codeB2
Filing dateAug 23, 2024
Priority dateAug 25, 2023
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may be configured to provide natural language search over security videos. In some aspects, the system may generate a first representation of sampled video information in a multidimensional format via a first machine learning model, and receive a request including a natural language input. Further, the system may generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model, and determine that the first representation has a predefined relationship with the second representation. In addition, the system may present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 2 . The method of claim 1 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 3 . The method of claim 1 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 4 . The method of claim 1 , further comprising: receiving video capture information from a video capture device; and sampling the video capture information to generate the sampled video information. 5 . The method of claim 1 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 6 . The method of claim 1 , wherein the first machine learning model is a convolutional neural network. 7 . The method of claim 1 , further comprising jointly training the first machine learning model and the second machine learning model using a common process. 8 . A system comprising: at least one memory storing instructions thereon; and at least one processor coupled to the at least one memory and configured by the instructions to: generate a first representation of sampled video information in a multidimensional format via a first machine learning model; receive a request including a natural language input; generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determine that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 9 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive a search query for a plurality of video frames corresponding an event defined by the natural language input. 10 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive the request for an alert identifying an occurrence of an event corresponding to the natural language input. 11 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive video capture information from a video capture device, and sample the video capture information to generate the sampled video information. 12 . The system of claim 8 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 13 . The system of claim 8 , wherein the first machine learning model is a convolutional neural network. 14 . A non-transitory computer-readable device having instructions thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 15 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 16 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 17 . The method of claim 7 , wherein the common process comprises coordinated training to align the first representation and the second representation.

Assignees

Inventors

Classifications

  • Proximity measures, i.e. similarity or distance measures · CPC title

  • using neural networks · CPC title

  • Event detection · CPC title

  • G06F16/73Primary

    Querying · CPC title

  • using objects detected or recognised in the video content · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12596747B2 cover?
A system may be configured to provide natural language search over security videos. In some aspects, the system may generate a first representation of sampled video information in a multidimensional format via a first machine learning model, and receive a request including a natural language input. Further, the system may generate a second representation of the natural language input in the mul…
Who is the assignee on this patent?
Tyco Fire & Security Gmbh
What technology area does this patent fall under?
Primary CPC classification G06F16/73. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).