What technology area does this patent fall under?

Primary CPC classification H04N23/21. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Nov 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Spatial-temporal anomaly and event detection using night vision sensors

US12475705B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12475705-B2
Application number	US-202318331007-A
Country	US
Kind code	B2
Filing date	Jun 7, 2023
Priority date	Jun 7, 2022
Publication date	Nov 18, 2025
Grant date	Nov 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, the disclosure describes techniques for joint spatiotemporal Artificial Intelligence (AI) models that can encompass multiple space and time resolutions through self-supervised learning. In an example, a method includes for each of a plurality of multimodal data, generating, by a computing system, using a first machine learning model, a respective modality feature vector representative of content of the multimodal data, wherein each of the generated modality feature vectors has a different modality; processing, by the computing system, each of generated modality feature vectors with a second machine learning model comprising an encoder model to generate event data comprising a plurality of events and/or activities of interest; and analyzing, by the computing system, the event data to generate anomaly data indicative of detected anomalies in the multimodal data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: processing, by a computing system, using a first machine learning model, first content of multimodal data to generate a first modality feature vector representative of the first content, wherein the multimodal data comprises multidimensional spectral data comprising multi-resolution data in both space and time, and wherein the first content has a first modality and the first modality feature vector has the first modality; processing, by the computing system, using the first machine learning model, second content of the multimodal data to generate a second modality feature vector representative of the second content, wherein the second content has a second modality and the second modality feature vector has the second modality, wherein the first modality is different than the second modality; processing, by the computing system, using a second machine learning model, the first modality feature vector and the second modality feature vector to generate event data comprising at least one of an event or an activity of interest; and processing, by the computing system, the event data to generate anomaly data indicative of detected anomalies in the multimodal data. 2 . The method of claim 1 , further comprising: training, by the computing system, the second machine learning model using the generated anomaly data. 3 . The method of claim 1 , wherein the multimodal data comprises sensor data generated by one or more night vision sensors. 4 . The method of claim 3 , wherein the one or more night vision sensors comprise at least one of a Short-Wave InfraRed (SWIR) sensor, Medium-Wave InfraRed (MWIR) sensor, Long-Wave InfraRed (LWIR) sensor, and a Near Infrared (NIR) sensor. 5 . The method of claim 1 , wherein the second machine learning model comprises a transformer model. 6 . The method of claim 5 , wherein an intermediate layer of the transformer model comprises N transformer layers, and wherein each of the N transformer layers comprises an attention mechanism module. 7 . The method of claim 5 , wherein the transformer model comprises a joint spatiotemporal model encompassing a plurality of spatial resolutions and a plurality of temporal resolutions. 8 . The method of claim 1 , wherein analyzing the event data to generate anomaly data further comprises analyzing the event data using a domain knowledge model. 9 . The method of claim 1 , wherein the first modality feature vector comprises an embedding for the first modality and the second modality feature vector comprises an embedding for the second modality. 10 . A computing system comprising: an input device configured to receive multimodal data; processing circuitry and memory for executing a machine learning system, wherein the machine learning system is configured to: process, using a first machine learning model, first content of the multimodal data to generate a first modality feature vector representative of the first content wherein the multimodal data comprises multidimensional spectral data comprising multi-resolution data in both space and time, and wherein the first content has a first modality and the first modality feature vector has the first modality; process, using the first machine learning model, second content of the multimodal data to generate a second modality feature vector representative of the second content, wherein the second content has a second modality and the second modality feature vector has the second modality, wherein the first modality is different than the second modality; process using a second machine learning model, the first modality feature vector and the second modality feature vector to generate event data comprising at least one of an event or an activity of interest and process the event data to generate anomaly data indicative of detected anomalies in the multimodal data; and an output device configured to output the generated anomaly data. 11 . The computing system of claim 10 , wherein the machine learning system is further configured to train the second machine learning model using the generated anomaly data. 12 . The computing system of claim 10 , wherein the multimodal data comprises sensor data generated by one or more night vision sensors. 13 . The computing system of claim 12 , wherein the one or more night vision sensors comprise at least one of a Short-Wave InfraRed (SWIR) sensor, Medium-Wave InfraRed (MWIR) sensor, Long-Wave InfraRed (LWIR) sensor, and a Near Infrared (NIR) sensor. 14 . The computing system of claim 10 , wherein the second machine learning model comprises a transformer model. 15 . The computing system of claim 14 , wherein an intermediate layer of the transformer model comprises N transformer layers, and wherein each of the N transformer layers comprises an attention mechanism module. 16 . Non-transitory computer-readable media comprising machine readable instructions for configuring processing circuitry to: process, using a first machine learning model, first content of multimodal data to generate a first modality feature vector representative of the first content, wherein the multimodal data comprises multidimensional spectral data comprising multi-resolution data in both space and time, and wherein the first content has a first modality and the first modality feature vector has the first modality; process, using the first machine learning model, second content of the multimodal data to generate a second modality feature vector representative of the second content, wherein the second content has a second modality and the second modality feature vector has the second modality, wherein the first modality is different than the second modality; process, using a second machine learning model comprising an encoder model, the first modality feature vector and the second modality feature vector to generate event data comprising at least one of an event or an activity of interest; and process the event data to generate anomaly data indicative of detected anomalies in the multimodal data.

Assignees

Stanford Res Inst Int

Inventors

Classifications

G06V10/44
Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title
H04N23/21Primary
from near infrared [NIR] radiation · CPC title
G06V20/44Primary
Event detection · CPC title

Patent family

Related publications grouped by family.

View patent family 91583690

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12475705B2 cover?: In general, the disclosure describes techniques for joint spatiotemporal Artificial Intelligence (AI) models that can encompass multiple space and time resolutions through self-supervised learning. In an example, a method includes for each of a plurality of multimodal data, generating, by a computing system, using a first machine learning model, a respective modality feature vector representati…
Who is the assignee on this patent?: Stanford Res Inst Int
What technology area does this patent fall under?: Primary CPC classification H04N23/21. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Nov 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).