Voice activity segmentation device, voice activity segmentation method, and voice activity segmentation program

US9293131B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9293131-B2
Application numberUS-201113814141-A
CountryUS
Kind codeB2
Filing dateAug 2, 2011
Priority dateAug 10, 2010
Publication dateMar 22, 2016
Grant dateMar 22, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program. The voice activity segmentation device comprises: a first voice activity segmentation means for determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation means for determining, after a reference speech acquired from a reference speech storage means has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update means for updating the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation means and a correct segmentation calculated from the reference speech is decreased.

First claim

Opening claim text (preview).

What is claimed is: 1. A voice activity segmentation device comprising: a processor; and a memory, wherein the memory is configured to store and the processor is configured to implement: a first voice activity segmentation unit which determines a voice-active segment, which is a first voice-active segment, and a voice-inactive segment, which is a first voice-inactive segment, in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation unit which determines, after a reference speech acquired from a reference speech storage unit has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update unit which updates the threshold value in such a way that a discrepancy rate between the determination result of the second voice activity segmentation unit and a correct segmentation calculated from the reference speech is decreased. 2. The voice activity segmentation device according to claim 1 further comprising: a gain and frequency characteristic correction unit which corrects a gain or a frequency characteristic of the reference speech, which is superimposed in the first voice-inactive segment, by use of at least either a gain or a frequency characteristic, which is acquired from the time-series of the input sound in the first voice-active segment, so that the gain or the frequency characteristic of the reference speech is equal to the gain or the frequency characteristic respectively, which is acquired from the time-series of the input sound in the first voice-active segment. 3. The voice activity segmentation device according to claim 1 further comprising: a reference speech selection unit which selects a reference speech which has a feature value similar to the feature value of the time-series of the input sound in the first voice-active segment as the reference speech which is superimposed in the first voice-inactive segment, out of a plural reference speeches each of which has a different feature value and which are stored in the reference speech storage unit. 4. The voice activity segmentation device according to claim 1 further comprising: a speech recognition unit which finds out a segment of a sequence of words which is corresponding to the time-series of the input sound in the first voice-active segment; and a determination result comparison unit which determines a discrepancy rate between the first voice-active segment and the segment of the sequence of words which the speech recognition unit finds out, wherein the threshold update unit updates the threshold value on the basis of the discrepancy rate determined by the determination result comparison unit, and the discrepancy rate between the determination of the second voice activity segmentation unit and the correct segmentation calculated from the reference speech. 5. A non-transitory computer readable medium storing a voice activity segmentation program which makes a computer execute: a first voice activity segmentation step for determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; a second voice activity segmentation step for determining, after a reference speech acquired from a reference speech storage unit has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment are determined by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and a threshold value update step for updating the threshold value in such a way that a discrepancy rate between the determination result obtained in the second voice activity segmentation step and a correct segmentation calculated from the reference speech is decreased. 6. The non-transitory computer readable medium according to claim 5 storing the voice activity segmentation program which makes the computer execute furthermore: a step for correcting a gain or a frequency characteristic of the reference speech which is superimposed in the first voice-inactive segment, by use of at least either a gain or a frequency characteristic which is acquired from the time-series of the input sound in the first voice-active segment, so that the gain or the frequency characteristic of the reference speech is equal to the gain or the frequency characteristic respectively, which is acquired from the time-series of the input sound in the first voice-active segment. 7. The non-transitory computer readable medium according to claim 5 storing the voice activity segmentation program which makes the computer execute furthermore: a step for selecting a reference speech which has a feature value similar to the feature value of the time-series of the input sound in the first voice-active segment, as the reference speech which is superimposed in the first voice-inactive segment, out of a plural reference speeches each of which has a different feature value and which are stored in the reference speech storage unit. 8. The non-transitory computer readable medium according to claim 5 storing the voice activity segmentation program which makes the computer execute: a speech recognition step for finding out a segment of a sequence of words which is corresponding to the time-series of the input sound in the first voice-active segment; a determination result comparison step for determining a discrepancy rate between the first voice-active segment and the segment of the sequence of words; and the threshold value update step for updating the threshold value on the basis of the discrepancy rate determined in the determination result comparison step, and a discrepancy rate between the determination obtained in the second voice activity segmentation step and the correct segmentation calculated from the reference speech. 9. A voice activity segmentation method comprising: determining a voice-active segment (first voice-active segment) and a voice-inactive segment (first voice-inactive segment) in a time-series of input sound by comparing a threshold value and a feature value of the time-series of the input sound; determining, after a reference speech acquired from a reference speech storage unit has been superimposed on a time-series of the first voice-inactive segment, a voice-active segment and a voice-inactive segment in the time-series of the superimposed first voice-inactive segment by comparing the threshold value and a feature value of the time-series of the superimposed first voice-inactive segment; and updating the threshold value in such a way that a discrepancy rate between the determination result on the voice-active segment and the voice-inactive segment in the time-series of the superimposed first voice-inactive segment, and a correct segmentation calculated from the reference speech is decreased. 10. The voice activity segmentation method according to claim 9 comprising: correcting a gain or a frequency characteristic of the reference speech which is superimposed in the first voice-inactive segment, by use of at least either a gain or a frequency characteristic which is acquired from the time-series of the input sound in the first voice-active segment, so that the g

Assignees

Inventors

Classifications

  • G10L25/78Primary

    Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Detection of discrete points within a voice signal · CPC title

  • G10L15/04Primary

    Segmentation; Word boundary detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9293131B2 cover?
Provided is a noise-robust voice activity segmentation device which updates parameters used in the determination of voice-active segments without burdening the user, and also provided are a voice activity segmentation method and a voice activity segmentation program. The voice activity segmentation device comprises: a first voice activity segmentation means for determining a voice-active …
Who is the assignee on this patent?
Arakawa Takayuki, Tanaka Daisuke, Nec Corp
What technology area does this patent fall under?
Primary CPC classification G10L25/78. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 22 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).