System and method associated with expedient detection and reconstruction of cyber events in a compact scenario representation using provenance tags and customizable policy
US-11601442-B2 · Mar 7, 2023 · US
US12081827B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12081827-B2 |
| Application number | US-202217822573-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 26, 2022 |
| Priority date | Aug 26, 2022 |
| Publication date | Sep 3, 2024 |
| Grant date | Sep 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to systems, methods, and non-transitory computer readable media that utilize deep learning to map query videos to known videos so as to identify a provenance of the query video or identify editorial manipulations of the query video relative to a known video. For example, the video comparison system includes a deep video comparator model that generates and compares visual and audio descriptors utilizing codewords and an inverse index. The deep video comparator model is robust and ignores discrepancies due to benign transformations that commonly occur during electronic video distribution.
Opening claim text (preview).
What is claimed is: 1. A non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: sub-dividing a query video into visual segments and audio segments; generating visual descriptors for the visual segments of the query video utilizing a visual neural network encoder; generating audio descriptors for the audio segments of the query video utilizing an audio neural network encoder; determining video segments from a plurality of known videos that are similar to the query video based on the visual descriptors and audio descriptors utilizing an inverse index by: mapping the visual descriptors and the audio descriptors to codewords; and identifying the video segments from the plurality of known videos based on the mapped codewords; and identifying a known video of the plurality of known videos that corresponds to the query video from the determined video segments. 2. The non-transitory computer readable medium of claim 1 , wherein the operations further comprise generating one or more visual indicators identifying locations of editorial modifications in the query video relative to the known video. 3. The non-transitory computer readable medium of claim 1 , wherein sub-dividing the query video into visual segments and audio segments comprises subdividing the query video into equal-length segments. 4. The non-transitory computer readable medium of claim 1 , wherein determining video segments from the plurality of known videos that are similar to the query video based on the visual descriptors and the audio descriptors utilizing an inverse index comprises: identifying one or more known videos that include the codewords, and ranking the one or more known videos. 5. The non-transitory computer readable medium of claim 1 , wherein the operations further comprise fusing the visual descriptors and audio descriptors prior to mapping the visual descriptors and audio descriptors to the codewords. 6. The non-transitory computer readable medium of claim 1 , wherein mapping the visual descriptors and the audio descriptors to the codewords comprises: mapping the visual descriptors to visual codewords; and mapping the audio descriptors to audio codewords. 7. The non-transitory computer readable medium of claim 1 , wherein: the operations further comprise generating unified audio-visual embeddings from corresponding visual and audio descriptors utilizing a fully connected neural network layer; and mapping the visual descriptors and audio descriptors to the codewords comprises mapping unified audio-visual embeddings to a codebook. 8. The non-transitory computer readable medium of claim 1 , wherein determining video segments from a plurality of known videos that are similar to the query video based on the visual descriptors and audio descriptors comprises determining a segment relevance score between a video segment of the known video and a codeword mapped to a segment of the query video by: determining a codeword frequency indicating a number of times the codeword appears in the video segment of the known video; and determining an inverse video frequency that measures how common the codeword is across all video segments in the inverse index. 9. The non-transitory computer readable medium of claim 8 , wherein the operations further comprise: determining a video relevance score by summing segment relevance scores between the video segments of the known video and the mapped codewords; and ranking a subset of known videos from the plurality of known videos corresponding to the determined video segments based on video relevance scores. 10. The non-transitory computer readable medium of claim 9 , wherein identifying the known video of the plurality of known videos that corresponds to the query video from the determined video segments comprises performing edit distance re-ranking of the subset of known videos. 11. The non-transitory computer readable medium of claim 1 , wherein generating visual descriptors for the visual segments of the query video utilizing the visual neural network encoder comprises generating a visual segment embedding for a combination of frames of a visual segment of the query video utilizing the visual neural network encoder. 12. The non-transitory computer readable medium of claim 1 , wherein generating visual descriptors for the visual segments of the query video utilizing the visual neural network encoder comprises: generating frame embeddings for each frame of a visual segment of the query video utilizing the visual neural network encoder; and averaging the frame embeddings for the visual segment to generate a visual descriptor for the visual segment. 13. A system comprising: one or more memory devices comprising a set of known digital videos; and one or more processors that are configured to cause the system to: sub-divide known videos into visual segments and audio segments; generate visual descriptors for the visual segments utilizing a visual neural network encoder; generate audio descriptors for the audio segments utilizing an audio neural network encoder; generate codewords from the audio descriptors and the visual descriptors; generate an inverse index for identifying known videos corresponding to query videos by mapping video segments from the known videos to the codewords; map query video visual descriptors and query video audio descriptors from a query video to the codewords; determine one or more video segments from the known videos that correspond to the query video based on the codewords; and identify a known video of the set of known digital videos that corresponds to the query video from the determined one or more video segments. 14. The system of claim 13 , wherein the one or more processors are further configured to cause the system to generate visual descriptors and audio descriptors that are robust to benign visual and audio perturbations. 15. The system of claim 14 , wherein the one or more processors are further configured to cause the system to learn parameters of the visual neural network encoder utilizing video frames with frame-level augmentations including one or more of random noise, blur, horizonal flip, pixelation, rotation, text overlay, emoji overlay, padding, or color jitter. 16. The system of claim 14 , wherein the one or more processors are further configured to cause the system to learn parameters of the audio neural network encoder utilizing audio segments with augmentations including one or more of audio lengthening, audio shortening, addition of audio components, removal of audio components, or alteration of audio components. 17. The system of claim 13 , wherein the one or more processors are further configured to cause the system to learn parameters of the visual neural network encoder and the audio neural network encoder utilizing a contrastive loss. 18. A computer-implemented method comprising: sub-dividing a query video into visual segments and audio segments; generating visual descriptors for the visual segments of the query video utilizing a visual neural network encoder that is robust to benign visual perturbations; generating audio descriptors for the audio segments of the query video utilizing an audio neural network encoder that is robust to benign audio perturbations; determining video segments from a plurality of known videos that are similar to the query video based on the visual descriptors and audio descriptors utilizing an inverse index by: mapping
using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings · CPC title
by decomposing the content in the time domain, e.g. in time segments · CPC title
Generation or processing of descriptive data, e.g. content descriptors {(systems specially adapted for using meta-information in broadcast systems H04H60/73)} · CPC title
Query formulation · CPC title
using metadata automatically derived from the content · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.