Speech feature extraction apparatus and speech feature extraction method

US9754603B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9754603-B2
Application numberUS-201213728287-A
CountryUS
Kind codeB2
Filing dateDec 27, 2012
Priority dateJan 10, 2012
Publication dateSep 5, 2017
Grant dateSep 5, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, a speech feature extraction apparatus includes an extraction unit and a calculation unit. The extraction unit extracts speech segments over a predetermined period at intervals of a unit time from either an input speech signal or a plurality of subband input speech signals obtained by extracting signal components of a plurality of frequency bands from the input speech signal, to generate either a unit speech signal or a plurality of subband unit speech signals. The calculation unit calculates either each average time of the unit speech signal in each of the plurality of frequency bands or each average time of each of the plurality of subband unit speech signals to obtain a speech feature.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech feature extraction apparatus, comprising: a computer programmed to comprise: an extraction unit configured to extract speech segments over a predetermined period at intervals of a unit time from an input speech signal to generate a unit speech signal; a first calculation unit configured to calculate each subband average time corresponding to time required to reach a center of energy gravity of the unit speech signal in each of a plurality of frequency bands obtained by dividing an overall frequency band into a number smaller than a bin number of frequency; a generation unit configured to generate a speech feature in each of the frequency bands based on the subband average time; and a decoder performing speech recognition processing to transform the input speech signal into words using the speech feature in the each of the frequency bands, wherein the speech feature is expressed in terms of time. 2. The apparatus according to claim 1 , wherein the computer is further programmed to comprise: a second calculation unit configured to calculate a power spectrum of the unit speech signal, and wherein the extraction unit extracts speech segments over the predetermined period from the input speech signal at intervals of the unit time to generate the unit speech signal, and wherein the first calculation unit calculates the subband average time based on the power spectrum. 3. The apparatus according to claim 2 , wherein the computer is further programmed to comprise: a third calculation unit configured to calculate a first product of a real part of a first spectrum of the unit speech signal and a real part of a second spectrum of a product of the unit speech signal and a time, to calculate a second product of an imaginary part of the first spectrum and an imaginary part of the second spectrum, and to add the first product and the second product together to obtain a third spectrum; and wherein the first calculation unit calculates the subband average time based on the power spectrum and the third spectrum. 4. The apparatus according to claim 3 , wherein the computer is further programmed to comprise: a first application unit configured to apply a first filter bank to the power spectrum to obtain a filtered power spectrum; and a second application unit configured to apply a second filter bank to the third spectrum to obtain a filtered third spectrum, and wherein the first calculation unit calculates the subband average time based on the filtered power spectrum and the filtered third spectrum. 5. The apparatus according to claim 3 , wherein the first calculation unit calculates the subband average time in a given frequency band of the frequency bands by dividing a summation of the third spectrum in the given frequency band by a summation of the power spectrum in the given frequency band. 6. The apparatus according to claim 2 , wherein the computer is further programmed to comprise: a third calculation unit configured to calculate a group delay spectrum of the unit speech signal; and a multiplication unit configured to multiply the power spectrum by the group delay spectrum to obtain a multiplication spectrum, and wherein the first calculation unit calculates the subband average time based on the power spectrum and the multiplication spectrum. 7. The apparatus according to claim 6 , wherein the computer is further programmed to comprise: a first application unit configured to apply a first filter bank to the power spectrum to obtain a filtered power spectrum; and a second application unit configured to apply a second filter bank to the multiplication spectrum to obtain a filtered multiplication spectrum, and wherein the first calculation unit calculates the subband average time based on the filtered power spectrum and the filtered multiplication spectrum. 8. The apparatus according to claim 2 , wherein the computer is further programmed to comprise: an application unit configured to apply a filter bank to the power spectrum to obtain a filtered power spectrum, and wherein the first calculation unit calculates the subband average time based on the filtered power spectrum. 9. The apparatus according to claim 1 , wherein the generation unit generates the speech feature by applying an axis transformation process on the subband average time. 10. A non-transitory computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising: extracting speech segments over a predetermined period at intervals of a unit time from an input speech signal to generate a unit speech signal; calculating each subband average time corresponding to time required to reach a center of energy gravity of the unit speech signal in each of a plurality of frequency bands obtained by dividing an overall frequency band into a number smaller than a bin number of frequency; generating a speech feature in each of the frequency bands based on the subband average time; and transforming, by a decoder that performs speech recognition processing, the input speech signal into words using the speech feature in the each of the frequency bands, wherein the speech feature is expressed in terms of time. 11. A speech feature extraction method, comprising: controlling a computer to: extract speech segments over a predetermined period at intervals of a unit time from an input speech signal to generate a unit speech signal; calculate each subband average time corresponding to time required to reach a center of energy gravity of the unit speech signal in each of a plurality of frequency bands obtained by dividing an overall frequency band into a number smaller than a bin number of frequency; generate a speech feature in each of the frequency bands based on the subband average time; and transform, by a decoder that performs speech recognition processing, the input speech signal into words using the speech feature in the each of the frequency bands, wherein the speech feature is expressed in terms of time. 12. A speech feature extraction method, comprising: controlling a computer to: extract speech segments over a predetermined period at intervals of a unit time from the plurality of subband input speech signals obtained by extracting signal components of a plurality of frequency bands from the input speech signal to generate a plurality of subband unit speech signals; calculate each subband average time corresponding to a center of energy gravity of power of each of the plurality of subband unit speech signals within a predetermined interval; generate a speech feature in each of the frequency bands based on the subband average time; and transform, by a decoder that performs speech recognition processing, the input speech signal into words using the speech feature in the each of the frequency bands, wherein the speech feature is expressed in terms of time. 13. A non-transitory computer readable storage medium storing instructions of a computer program which when executed by a computer results in performance of steps comprising: extracting speech segments over a predetermined period at intervals of a unit time from a plurality of subband input speech signals obtained by extracting signal components of a plurality of frequency bands from an input speech signal to generate a plurality of subband unit speech signals, wherein the plurality of subband input speech signals is obtained from the input speech signal by a band-pass filter; calculating each subband average time corresponding to a center of energy gravity of power of each of the plurality of subband unit speech s

Assignees

Inventors

Classifications

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • G10L21/00Primary

    Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility (G10L19/00 takes precedence) · CPC title

  • G10L15/02Primary

    Feature extraction for speech recognition; Selection of recognition unit · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9754603B2 cover?
According to one embodiment, a speech feature extraction apparatus includes an extraction unit and a calculation unit. The extraction unit extracts speech segments over a predetermined period at intervals of a unit time from either an input speech signal or a plurality of subband input speech signals obtained by extracting signal components of a plurality of frequency bands from the input speec…
Who is the assignee on this patent?
Nakamura Masanobu, Masuko Takashi, Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G10L21/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 05 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).