What technology area does this patent fall under?

Primary CPC classification G06F3/165. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Semi-supervised speaker diarization

US10133538B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10133538-B2
Application number	US-201514671918-A
Country	US
Kind code	B2
Filing date	Mar 27, 2015
Priority date	Mar 27, 2015
Publication date	Nov 20, 2018
Grant date	Nov 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An audio file analyzer computing system includes technologies to, among other things, localize audio events of interest (such as speakers of interest) within an audio file that includes multiple different classes (e.g., different speakers) of audio. The illustrative audio file analyzer computing system uses a seed segment to perform a semi-supervised diarization of the audio file. The seed segment is pre-selected, such as by a human person using an interactive graphical user interface.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for a high precision segmentation of an audio file having an undetermined number of speakers, the method comprising: receiving a selection indicative of an audio event of interest in an electronic file that includes an undetermined number of different audio events; in response to the selection, creating a seed segment that is representative of less than or equal to ten seconds of the audio event of interest; by comparing features of the seed segment to a set of features extracted from the electronic file, separating the features extracted from the electronic file into a first subset that includes the features extracted from the seed segment and a second subset that includes features extracted from a remaining portion of the electronic file that does not include the seed segment; creating a seed model using, as training data, only the first subset and not the second subset; creating a non-seed model using, as training data, only the second subset and not the first subset; for a feature in the second subset, computing a score based on a comparison of the feature to the seed model and a comparison of the feature to the non-seed model; outputting a segment of the electronic file, wherein the segment includes the feature and at least one label indicative of the seed score and the non-seed score. 2. The method of claim 1 , comprising displaying, in a window of a graphical user interface, a time-based graphical depiction of the audio event of interest with a time-based graphical depiction of the at least one segment of the remaining portion of the electronic file that is related to the audio event of interest. 3. The method of claim 1 , comprising accessing the electronic file through a video player application, and in a graphical user interface, aligning a playing of a video portion of the electronic file with a time-based graphical depiction of the at least one segment of the remaining portion of the electronic file that is related to the audio event of interest. 4. The method of claim 1 , comprising displaying, in a graphical user interface, a list of interactive elements including an interactive element representative of the electronic file, and in response to a selection of the interactive element, playing the at least one segment of the remaining portion of the electronic file that is related to the audio event of interest. 5. The method of claim 1 , comprising determining an offset value based on a characteristic of the seed segment; adjusting the seed score based on the offset value; comparing the adjusted seed score to the non-seed score. 6. The method of claim 1 , comprising computing both the seed score and the non-seed score using a likelihood log ratio. 7. The method of claim 1 , wherein the offset value is determined in response to an interaction with a graphical user interface element. 8. The method of claim 1 , comprising ranking a plurality of audio events in the electronic file based on comparing the adjusted seed score to the non-seed score. 9. The method of claim 1 , wherein the audio event of interest comprises (i) speech or (ii) non-speech or (iii) a combination of (i) and (ii). 10. The method of claim 1 , comprising receiving a plurality of user interface-based selections each corresponding to a different segment of the electronic file, and creating the seed segment based on the plurality of user interface-based selections. 11. The method of claim 1 , comprising selecting a filter based on at least one of (i) a type associated with the seed segment or (ii) a characteristic of the seed segment and prior to the separating, using the selected filter to determine the set of features of the electronic file. 12. The method of claim 1 , comprising creating a new model based on the audio event of interest and at least one segment of the remaining portion of the electronic file that matches the audio event of interest. 13. The method of claim 12 , comprising using the new model, performing audio event recognition on a new electronic file. 14. The method of claim 12 , comprising using the new model, searching a audio files for audio events of a same type as the audio event of interest, and outputting a list of audio files arranged according to a likelihood that the audio files comprise an audio event of the same type as the audio event of interest. 15. The method of claim 1 , wherein the selection of the audio event of interest is received in response to an interaction with a graphical user interface element. 16. The method of claim 1 , wherein the audio event of interest comprises a speech segment produced by a person of interest and the method comprises outputting a list of multi-speaker audio files that comprise speech produced by the person of interest. 17. The method of claim 16 , comprising ranking each audio file in the list based on a likelihood of the audio file comprising speech produced by the person of interest. 18. The method of claim 1 , comprising displaying a graphical representation of the electronic file, displaying a plurality of interactive graphical user interface elements to facilitate user selection of the seed segment and visualization of at least one segment of the remaining portion of the electronic file that matches the audio event of interest. 19. The method of claim 1 , comprising displaying, in a graphical user interface, an interactive graphical element representative of the seed segment. 20. The method of claim 1 , comprising displaying, in a graphical user interface, an interactive graphical element representative of a segment of the remaining portion of the electronic file that matches the audio event of interest.

Assignees

Stanford Res Inst Int

Inventors

Classifications

G06F16/65
Clustering; Classification · CPC title
G10L25/54
for retrieval · CPC title
G10L25/27
characterised by the analysis technique · CPC title
G06F3/0484
for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range · CPC title
G10L17/06
Decision making techniques; Pattern matching strategies · CPC title

Patent family

Related publications grouped by family.

View patent family 56975519

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10133538B2 cover?: An audio file analyzer computing system includes technologies to, among other things, localize audio events of interest (such as speakers of interest) within an audio file that includes multiple different classes (e.g., different speakers) of audio. The illustrative audio file analyzer computing system uses a seed segment to perform a semi-supervised diarization of the audio file. The seed segm…
Who is the assignee on this patent?: Stanford Res Inst Int
What technology area does this patent fall under?: Primary CPC classification G06F3/165. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).