What technology area does this patent fall under?

Primary CPC classification G06F16/685. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Feb 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Audio analysis system with query processing

US2025068673A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2025068673-A1
Application number	US-202418813647-A
Country	US
Kind code	A1
Filing date	Aug 23, 2024
Priority date	Aug 25, 2023
Publication date	Feb 27, 2025
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computing system is configured to obtain a plurality of media files that each includes speech of one or more speakers. The computing system is further configured to process the plurality of media files to generate indexed data, wherein the indexed data includes a corresponding embedding for each speaker of the one or more speakers identified in the media file and a corresponding one or more keywords identified in the speech in the media file. The computing system is further configured to receive an indication at least one of a selection of a particular speaker from the one or more speakers or a selection of a particular keyword from a plurality of keywords. The computing system is further configured to generate one or more correlations based on the indexed data. The computing system is further configured to output an alert regarding the one or more correlations.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: obtaining, by a computing system and from one or more sources that provide media files over one or more networks, a plurality of media files that each includes speech of one or more speakers; processing, by the computing system, the plurality of media files to generate indexed data, wherein the indexed data includes, for each media file of the plurality of media files, a corresponding embedding for each speaker of the one or more speakers identified in the media file and a corresponding transcript of speech for each language identified in the speech in the media file; receiving, by the computing system, an indication of at least one of a selection of a particular speaker from the one or more speakers or a selection of a particular keyword from a plurality of keywords; generating, by the computing system, one or more correlations based on the indexed data, wherein the one or more correlations include at least one of an association among the one or more speakers or an association among keywords detected in the transcripts as spoken by the one or more speakers; and outputting, by the computing system, based on the one or more correlations, an indication regarding the one or more correlations. 2 . The method of claim 1 , wherein processing the plurality of media files into indexed data includes clustering excerpts from the plurality of media files. 3 . The method of claim 1 , wherein processing the plurality of media files into indexed data includes: extracting embeddings from overlapping windows from each of the plurality of media files; and applying clustering to the embeddings. 4 . The method of claim 1 , further comprising: generating master data by matching keywords included in a watchlist to the transcripts. 5 . The method of claim 1 , further comprising: receiving, by the computing system, an indication of a selection of a particular keyword, and wherein outputting the indication includes generating the indication based on determining that at least one media file of the plurality of media files includes a speaker speaking the particular keyword. 6 . The method of claim 1 , wherein the indexed data includes, for each speaker, at least one of: a gender identifier, or an identifier of the language spoken by the speaker. 7 . The method of claim 6 , further comprising: determining, by the computing system, a subset of the one or more speakers, where each speaker of the subset of the one or more speakers is associated with the particular speaker, and wherein determining the subset includes: determining that the speakers of the subset of one or more speakers and the particular speaker speak in a same media file. 8 . The method of claim 1 , further comprising: receiving, by the computing system, an indication of a selection of a particular speaker of the one or more speakers; and identifying, by the computing system, one or more media files that include speech by the particular speaker. 9 . The method of claim 1 , wherein the one or more sources comprise a Clearnet site and a darknet site. 10 . The method of claim 1 , wherein the indexed data includes, for a media file of the plurality of media files, respective identifiers for multiple speakers of the one or more speakers that speak in the media file, and wherein generating the one or more correlations based on the indexed data comprises identifying an association among the multiple speakers based on the identifiers for the multiple speakers that speak in the media file. 11 . The method of claim 1 , further comprising: outputting, by the computing system, a graphical user interface (GUI), wherein the GUI includes one or more visual representations of the one or more correlations. 12 . The method of claim 1 , wherein processing the plurality of media files to generate the indexed data includes: processing the plurality of media files to generate, for each media file, respective embeddings for one or more speakers having speech in the media file; and matching speaker embeddings included in a watchlist to the embeddings for the one or more speakers having speech in the media file. 13 . The method of claim 1 , wherein, for each media file of the plurality of media files, the corresponding one or more keywords identified in the speech in the media file are present in a transcript of the media file. 14 . The method of claim 1 , wherein the media file is a first media file, wherein the indication is first indication, and further comprising: receiving, by the computing system, a second media file of a speaker for enrollment, wherein the second media file includes speech of at least one speaker; processing, by the computing system, the second media file, wherein processing the second media file includes: extracting an embedding of the at least one speaker from the second media file, and matching the embedding to a cluster of one or more clusters of a plurality of speakers, wherein each cluster of the plurality of clusters corresponds to a respective speaker of a plurality of speakers; and outputting, by the computing system, based on matching the embedding to the cluster, a second indication that includes an indication of a match between the at least one speaker and a speaker of the plurality of speakers. 15 . The method of claim 1 , wherein the plurality of media files include media files with audio events, and wherein generating the correlations includes generating correlations that include associations among the audio events. 16 . A computing system, comprising: memory; and one or more programmable processors in communication with the memory and configured to: obtain, from one or more sources that provide media files over one or more networks, a plurality of media files that each includes speech of one or more speakers; process the plurality of media files to generate indexed data, wherein the indexed data includes, for each media file of the plurality of media files, a corresponding embedding for each speaker of the one or more speakers identified in the media file and a corresponding transcript of speech for each language identified in the speech in the media file; receive an indication of at least one of a selection of a particular speaker from the one or more speakers or a selection of a particular keyword from a plurality of keywords; generate one or more correlations based on the indexed data, wherein the one or more correlations include at least one of an association among the one or more speakers or an association among keywords detected in the transcripts as spoken by the one or more speakers; and output, based on the one or more correlations, an indication regarding the one or more correlations. 17 . The computing system of claim 16 , wherein to process the plurality of media files into indexed data, the one or more programmable processors are further configured to: cluster excerpts from the plurality of media files. 18 . The computing system of claim 16 , wherein to process the plurality of media files into indexed data, the one or more programmable processors are further configured to: extract embeddings from overlapping windows from each of the plurality of media files; and apply clustering to the embeddings. 19 . The computing system of claim 16 , wherein the one or more programmable processors are further configured to generate master data by matching keywords included in a watchlist to the transcripts. 20 . Non-transit

Assignees

Stanford Res Inst Int

Inventors

Classifications

G10L15/005
Language recognition · CPC title
G10L2015/088
Word spotting · CPC title
G10L17/00
Speaker identification or verification techniques · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06F16/61
Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

View patent family 94688837

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025068673A1 cover?: A computing system is configured to obtain a plurality of media files that each includes speech of one or more speakers. The computing system is further configured to process the plurality of media files to generate indexed data, wherein the indexed data includes a corresponding embedding for each speaker of the one or more speakers identified in the media file and a corresponding one or more k…
Who is the assignee on this patent?: Stanford Res Inst Int
What technology area does this patent fall under?: Primary CPC classification G06F16/685. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Feb 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method of video capture and search optimization for creating an acoustic voiceprint

Systems and methods for identifying conversation roles

Fully Supervised Speaker Diarization

Transcription generation from multiple speech recognition systems

Frequently asked questions