Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof

US9251808B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9251808-B2
Application numberUS-201213412694-A
CountryUS
Kind codeB2
Filing dateMar 6, 2012
Priority dateJul 28, 2011
Publication dateFeb 2, 2016
Grant dateFeb 2, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire character strings representing contents of the utterances, and to extract linguistic features of the speakers by using the character strings. The error detection unit is configured to decide that, when one of the character strings does not fit with a linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for clustering speakers, the apparatus comprising: a memory device that stores an acoustic signal already recorded: a clustering unit executed by a computer using a stored program, configured to extract acoustic features for speakers from the acoustic signal, and to cluster utterances included in the acoustic signal into each of the speakers by using the acoustic features; an acquisition unit executed by the computer, configured to acquire character strings by recognizing the acoustic signal of each utterance; an extraction unit executed by the computer, configured to acquire the character strings representing contents of the utterances, and to extract a linguistic feature of each of the speakers by comparing the character strings with a first person rule and an end of sentence rule; an error detection unit executed by the computer, configured to decide that, when one of the character strings does not fit with the linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering unit; and a display that displays at least one of a clustering result of the clustering unit and a decision result of the error detection unit. 2. The apparatus according to claim 1 , further comprising: a re-clustering unit executed by the computer, configured to decide that, when the one fits with the linguistic feature of another speaker, the utterance is to be clustered into the another speaker. 3. The apparatus according to claim 1 , wherein the extraction unit decides whether the first person rule and the end of sentence rule previously-stored fit with the character strings of the utterances clustered into each of the speakers, and sets the rule of which the number of utterances of fitted character strings is larger than a predetermined threshold, as the linguistic feature of the speaker into which the utterances of fitted character strings are clustered. 4. The apparatus according to claim 2 , wherein the display displays a decision result of the re-clustering unit. 5. A method for clustering speakers, the method being realized by executing a stored program by a computer, the comprising: storing by the computer into a memory device, acoustic signal already recorded; extracting, by the computer, acoustic features for speakers from the acoustic signal; clustering, by the computer, utterances included in the acoustic signal into each of the speakers by using the acoustic features; acquiring, by the computer, character strings by recognizing the acoustic signal of each utterance; acquiring, by the computer, the character strings representing contents of the utterances; extracting, by the computer, a linguistic feature of each of the speakers by comparing the character strings with a first person rule and an end of sentence rule; deciding, by the computer, that, when one of the character strings does not fit with the linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering; and displaying by the computer via a display, at least one of a clustering result of the clustering and a decision result of the deciding. 6. A non-transitory computer readable medium for causing a computer to perform a method for clustering speakers, the method comprising: storing, by the computer into a memory device, acoustic signal already recorded; extracting acoustic features for speakers from an acoustic signal; clustering utterances included in the acoustic signal into each of the speakers by using the acoustic features; acquiring character strings by recognizing the acoustic signal of each utterance; acquiring the character strings representing contents of the utterances; extracting a linguistic feature of each of the speakers by comparing the character strings with a first person rule and an end of sentence rule; deciding that, when one of the character strings does not fit with the linguistic feature of a speaker into which an utterance of the one is clustered, the utterance is erroneously clustered by the clustering; and displaying by the computer via a display, at least one of a clustering result of the clustering and a decision result of the deciding.

Assignees

Inventors

Classifications

  • G10L21/028Primary

    using properties of sound source · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9251808B2 cover?
According to one embodiment, a speaker clustering apparatus includes a clustering unit, an extraction unit, and an error detection unit. The clustering unit is configured to extract acoustic features for speakers from an acoustic signal, and to cluster utterances included in the acoustic signal into the speakers by using the acoustic features. The extraction unit is configured to acquire charac…
Who is the assignee on this patent?
Ikeda Tomoo, Nagao Manabu, Nishiyama Osamu, and 4 more
What technology area does this patent fall under?
Primary CPC classification G10L21/028. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 02 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).