What technology area does this patent fall under?

Primary CPC classification G10L15/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Prefix methods for diarization in streaming mode

US10614797B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10614797-B2
Application number	US-201715827934-A
Country	US
Kind code	B2
Filing date	Nov 30, 2017
Priority date	Dec 1, 2016
Publication date	Apr 7, 2020
Grant date	Apr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined based on a minimum edit or hamming distance. The distance may alternatively be a measure other than the minimum edit or hamming distance. The clustering may have a finite time window over which the analysis is performed.

First claim

Opening claim text (preview).

The invention claimed is: 1. An apparatus comprising: a memory storing a plurality of past audio segments and past diarization labels associated with the plurality of past audio segments; and a processor in communication with the memory, the processor configured to: receive an audio segment at a current point in time; accumulate the audio segment with the plurality of past audio segments to obtain accumulated audio segments data; cluster the accumulated audio segments data up to the current point in time, wherein the clustering includes: based on centroid labels of clusters associated with the accumulated audio segments, locating and assigning a plurality of different possible diarization labels associated with a plurality of speakers; and consolidating the accumulated audio segments data with the past diarization labels, wherein the consolidation includes performing a plurality of permutations of the centroid labels of clusters and finding an optimal diarization result that minimizes a difference between the past diarization labels and the permuted centroid labels up to the current point in time; and output the optimal diarization result comprising the permuted centroid labels associated with the plurality of speakers. 2. The apparatus of claim 1 , wherein the consolidation includes comparing a distance. 3. The apparatus of claim 2 , wherein the distance is determined based on a Hamming distance. 4. The apparatus of claim 2 , wherein the distance is determined based on a measure of a minimum edit. 5. The apparatus of claim 2 , wherein the processor is further configured to select a permutation based on the distance. 6. The apparatus of claim 1 , wherein the accumulated audio segments is clustered during a finite period of time. 7. The apparatus of claim 1 , wherein the processor is further configured to store a plurality of audio segments up to the current point in time. 8. The apparatus of claim 1 , wherein clustering the accumulated audio segments data includes finding a cluster identifier (ID). 9. The apparatus of claim 1 , wherein the processor is further configured to associate a cluster with an audio segment. 10. The apparatus of claim 1 , wherein the processor further configured to output a label for a next occurring audio segment. 11. A method of managing an audio stream, the method comprising: obtaining a plurality of past audio segments and past diarization labels associated with the plurality of past audio segments; receiving an audio segment at a current point in time; accumulating the audio segment with the plurality of past audio segments to obtain accumulated audio segments; clustering the accumulated audio segments data up to the current point in time, wherein the clustering includes: based on centroid labels of clusters associated with the accumulated audio segments data, locating and assigning a plurality of different possible labels associated with a plurality of speakers; and consolidating the accumulated audio segments data with the past diarization labels, wherein the consolidation includes performing a plurality of permutations of the centroid labels of clusters and finding an optimal diarization result that minimizes a difference between the past diarization labels and the permuted centroid labels up to the current point in time; and outputting the optimal diarization result comprising the permuted centroid labels associated with the plurality of speakers. 12. The method of claim 11 , further comprising comparing a distance to determine the consolidation. 13. The method of claim 12 , further comprising comparing the distance using a Hamming distance. 14. The method of claim 13 , further comprising selecting a permutation based on the distance. 15. The method of claim 11 , wherein the accumulated audio segments is clustered during a finite period of time. 16. A program product comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code being executable by a processor to: obtain a plurality of past audio segments and past diarization labels associated with the plurality of past audio segments; receive an audio segment at a current point in time; accumulate the audio segment with the plurality of past audio segments to obtain accumulated audio segments data; cluster the accumulated audio segments data up to the current point in time, wherein the clustering includes: based on centroid labels of clusters associated with the accumulated audio segments data, locating and assigning a plurality of different possible labels associated with a plurality of speakers; and consolidating the accumulated audio segments data with the past diarization labels, wherein the consolidation includes performing a plurality of permutations of the centroid labels of clusters and finding an optimal diarization result that minimizes a difference between the past diarization labels and the permuted centroid labels up to the current point in time; and output the optimal diarization result comprising the permuted centroid labels associated with the plurality of speakers.

Assignees

Inventors

Classifications

G10L15/10Primary
using distance or distortion measures between unknown speech and reference templates · CPC title
G10L15/04
Segmentation; Word boundary detection · CPC title
G10L15/083
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
G10L2015/0631
Creating reference templates; Clustering · CPC title
G10L15/063
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 62244066

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614797B2 cover?: A diarization embodiment may include a system that clusters data up to a current point in time and consolidates it with the past decisions, and then returns the result that minimizes the difference with past decisions. The consolidation may be achieved by performing a permutation of the different possible labels and comparing the distance. For speaker diarization, a distance may be determined b…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Blind diarization of recorded calls with arbitrary number of speakers

Semi-supervised speaker diarization

Speaker indexing device and speaker indexing method

Conversation quality analysis

Providing a confidence measure for speaker diarization

Frequently asked questions