Speaker change detection device and speaker change detection method

US2016111112A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016111112-A1
Application numberUS-201514875092-A
CountryUS
Kind codeA1
Filing dateOct 5, 2015
Priority dateOct 17, 2014
Publication dateApr 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speaker change detection device sets first and second analysis periods before and after each of time points in a voice signal, generates, for each of the time points, a first speaker model from a distribution of features in frames in the first analysis period, and a second speaker model from a distribution of features in frames in the second analysis period, calculates, for each of the time points, a matching score representing the likelihood of similarity of features between a group of speakers in the first analysis period and a group of speakers in the second analysis period by applying the features extracted from the second analysis period to the first speaker model and applying the features extracted from the first analysis period to the second speaker model, and detects a speaker change point on the basis of the matching scores at the plurality of time points.

First claim

Opening claim text (preview).

What is claimed is: 1 . A speaker change detection device comprising: a processor configured to: extract features representing features of a human voice in each frame having a predetermined time length from a voice signal including a conversation between a plurality of speakers; set, for each of a plurality of different time points in the voice signal, a first analysis period before the time point and a second analysis period after the time point; generate, for each of the plurality of time points, a first speaker model representing features of voices of a group of at least two speakers speaking in the first analysis period on the basis of a distribution of the features of a plurality of frames included in the first analysis period and a second speaker model representing features of voices of a group of at least two speakers speaking in the second analysis period on the basis of a distribution of the features in a plurality of frames included in the second analysis period; calculate, for each of the plurality of time points, a matching score representing the likelihood of similarity of features between the group of speakers in the first analysis period and the group of speakers in the second analysis period by applying the features in a plurality of frames included in the second analysis period to the first speaker model and applying the features of a plurality of frames included in the first analysis period to the second speaker model; and detect a speaker change point at which a change from a group of speakers speaking before the speaker change point to another group of speakers speaking after the speaker change point occurs in the voice signal on the basis of the matching score for each of the plurality of time points. 2 . The speaker change detection device according to claim 1 , wherein, the detecting the speaker change point, when a local minimum matching score in a time sequence among the matching scores for the plurality of time points is lower than or equal to a predetermined threshold, detects a time point corresponding to the local minimum matching score as the speaker change point. 3 . The speaker change detection device according to claim 1 , wherein, the setting of the first analysis period, when a local minimum matching score in a time sequence among the matching scores for the plurality of time points is lower than or equal to a predetermined threshold, extends at least one of the first analysis period and the second analysis period for a first time point corresponding to the local minimum matching score in a direction away from the first time point; the generating the first speaker model and the second speaker model updates one of the first speaker model and the second speaker model that corresponds to the extended analysis period on the basis of a distribution of the features in a plurality of frames included in the extended analysis period of the first analysis period and the second analysis period for the first time point; the calculating the matching score updates the matching score by applying the features in a plurality of frames included in the extended analysis period one of the first analysis period and the second analysis period for the first time point to the speaker model of the other of the first analysis period and the second analysis period and applying the features in a plurality of frames included in the other analysis period to the updated speaker model; and the detecting the speaker change point detects the first time point as the speaker change point when the updated matching score is lower than or equal to the predetermined detection threshold. 4 . A speaker change detection method comprising: extracting, by a processor, features representing features of human voice in each frame having a predetermined time length from a voice signal including a conversation between a plurality of speakers; setting, by the processor, for each of a plurality of different time points in the voice signal, a first analysis period before the time point and a second analysis period after the time point; generating, by the processor, for each of the plurality of time points, a first speaker model representing features of voices of a group of at least two speakers speaking in the first analysis period on the basis of a distribution of the features of a plurality of frames included in the first analysis period and a second speaker model representing features of voices of a group of at least two speakers speaking in the second analysis period on the basis of a distribution of the features in a plurality of frames included in the second analysis period; calculating, by the processor, for each of the plurality of time points, a matching score representing the likelihood of similarity of features between the group of speakers in the first analysis period and the group of speakers in the second analysis period by applying the features in a plurality of frames included in the second analysis period to the first speaker model and applying the features of a plurality of frames included in the first analysis period to the second speaker model; and detecting, by the processor, a speaker change point at which a change from a group of speakers speaking before the speaker change point to another group of speakers speaking after the speaker change point occurs in the voice signal on the basis of the matching score for each of the plurality of time points. 5 . The speaker change detection method according to claim 4 , wherein, the detecting the speaker change point, when a local minimum matching score in a time sequence among the matching scores for the plurality of time points is lower than or equal to a predetermined threshold, detects a time point corresponding to the local minimum matching score as the speaker change point. 6 . The speaker change detection method according to claim 4 , wherein, the setting of the first analysis period, when a local minimum matching score in a time sequence among the matching scores for the plurality of time points is lower than or equal to a predetermined threshold, extends at least one of the first analysis period and the second analysis period for a first time point corresponding to the local minimum matching score in a direction away from the first time point; the generating the first speaker model and the second speaker model updates one of the first speaker model and the second speaker model that corresponds to the extended analysis period on the basis of a distribution of the features in a plurality of frames included in the extended analysis period of the first analysis period and the second analysis period for the first time point; the calculating the matching score updates the matching score by applying the features in a plurality of frames included in the extended analysis period one of the first analysis period and the second analysis period for the first time point to the speaker model of the other of the first analysis period and the second analysis period and applying the features in a plurality of frames included in the other analysis period to the updated speaker model; and the detecting the speaker change point detects the first time point as the speaker change point when the updated matching score is lower than or equal to the predetermined detection threshold. 7 . A non-transitory computer-readable recording medium having recorded thereon a speaker change detection computer program that causes a computer to execute a process comprising: extracting features representing features of human voice in each frame having a predetermined time length from a voice signal including a conversation between a plurality of speakers; setting, for each of a plurality of different time points in the voice signal, a first an

Assignees

Inventors

Classifications

  • Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title

  • Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices · CPC title

  • G10L25/51Primary

    for comparison or discrimination · CPC title

  • G10L17/06Primary

    Decision making techniques; Pattern matching strategies · CPC title

  • characterised by the type of analysis window · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016111112A1 cover?
A speaker change detection device sets first and second analysis periods before and after each of time points in a voice signal, generates, for each of the time points, a first speaker model from a distribution of features in frames in the first analysis period, and a second speaker model from a distribution of features in frames in the second analysis period, calculates, for each of the time p…
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification G10L25/51. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).