Training data synthesis for machine learning
US-2024355100-A1 · Oct 24, 2024 · US
US12596747B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12596747-B2 |
| Application number | US-202418814322-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 23, 2024 |
| Priority date | Aug 25, 2023 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system may be configured to provide natural language search over security videos. In some aspects, the system may generate a first representation of sampled video information in a multidimensional format via a first machine learning model, and receive a request including a natural language input. Further, the system may generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model, and determine that the first representation has a predefined relationship with the second representation. In addition, the system may present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation.
Opening claim text (preview).
What is claimed is: 1 . A method, comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 2 . The method of claim 1 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 3 . The method of claim 1 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 4 . The method of claim 1 , further comprising: receiving video capture information from a video capture device; and sampling the video capture information to generate the sampled video information. 5 . The method of claim 1 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 6 . The method of claim 1 , wherein the first machine learning model is a convolutional neural network. 7 . The method of claim 1 , further comprising jointly training the first machine learning model and the second machine learning model using a common process. 8 . A system comprising: at least one memory storing instructions thereon; and at least one processor coupled to the at least one memory and configured by the instructions to: generate a first representation of sampled video information in a multidimensional format via a first machine learning model; receive a request including a natural language input; generate a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determine that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and present the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 9 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive a search query for a plurality of video frames corresponding an event defined by the natural language input. 10 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive the request for an alert identifying an occurrence of an event corresponding to the natural language input. 11 . The system of claim 8 , wherein the at least one processor is further configured by the instructions to receive video capture information from a video capture device, and sample the video capture information to generate the sampled video information. 12 . The system of claim 8 , wherein at least one of the first machine learning model and the second machine learning model is a transformer model. 13 . The system of claim 8 , wherein the first machine learning model is a convolutional neural network. 14 . A non-transitory computer-readable device having instructions thereon that, when executed by at least one computing device, causes the at least one computing device to perform operations comprising: generating a first representation of sampled video information in a multidimensional format via a first machine learning model; receiving a request including a natural language input; generating a second representation of the natural language input in the multidimensional format via a second machine learning model that is a different from the first machine learning model; determining that the first representation has a predefined relationship with the second representation based on a learned proximity metric between the first representation and the second representation in the multidimensional format; and presenting the second representation as a response to the request based on the first representation having the predefined relationship with the second representation. 15 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving a search query for a plurality of video frames corresponding to an event defined by the natural language input. 16 . The non-transitory computer-readable device of claim 14 , wherein receiving the request includes receiving the request for an alert identifying an occurrence of an event corresponding to the natural language input. 17 . The method of claim 7 , wherein the common process comprises coordinated training to align the first representation and the second representation.
Proximity measures, i.e. similarity or distance measures · CPC title
using neural networks · CPC title
Event detection · CPC title
Querying · CPC title
using objects detected or recognised in the video content · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.