Method and system for selectively biased linear discriminant analysis in automatic speech recognition systems

US9679556B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9679556-B2
Application numberUS-201313974123-A
CountryUS
Kind codeB2
Filing dateAug 23, 2013
Priority dateAug 24, 2012
Publication dateJun 13, 2017
Grant dateJun 13, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method are presented for selectively biased linear discriminant analysis in automatic speech recognition systems. Linear Discriminant Analysis (LDA) may be used to improve the discrimination between the hidden Markov model (HMM) tied-states in the acoustic feature space. The between-class and within-class covariance matrices may be biased based on the observed recognition errors of the tied-states, such as shared HMM states of the context dependent tri-phone acoustic model. The recognition errors may be obtained from a trained maximum-likelihood acoustic model utilizing the tied-states which may then be used as classes in the analysis.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for training an acoustic model using the maximum likelihood criteria, comprising the steps of: a) performing a forced alignment of speech training data; b) processing the training data and obtaining estimated scatter matrices, wherein said scatter matrices may comprise one or more of a between class scatter matrix and a within-class scatter matrix, from which mean vectors may be estimated; c) biasing the between class scatter matrix and the within-class scatter matrix; d) diagonalizing the between class scatter matrix and the within class scatter matrix and estimating eigen-vectors to produce transformed scatter matrices; e) obtaining new discriminative features using the estimated vectors, wherein said vectors correspond to the highest discrimination in the new space; f) training a new acoustic model based on said new discriminative features, wherein the training further comprises the steps of: estimating parameters with new features obtained through a transformed matrix, and using a maximum likelihood formula with new features to perform the training; and g) saving said acoustic model. 2. The method of claim 1 , wherein step (a) further comprises the step of using the current maximum likelihood acoustic model on the entire speech training data with a Hidden Markov Model-Gaussian Mixture Model. 3. The method of claim 2 , wherein said training data may consist of phonemes and triphones wherein: a) a triphone's Hidden Markov Model states may be mapped to tied states; b) each feature frame may have a tied state class label; and c) said tied states may comprise unique classes between which the discrimination in an acoustic feature space is increased through selectively biased linear discriminant analysis. 4. The method of claim 1 , wherein step (b) further comprises the steps of: a) performing tied triphone recognition on the training data using a trained model; b) recording a recognition error rate of each triphone tied state using a transcription of the training data; c) representing a segment of audio corresponding to a triphone with a 39 dimensional Mel-frequency cepstral coefficient feature vector and a first order derivative and a second order derivative; d) mapping training data internally to a tied-triphone state; e) forming a super vector with said Mel-frequency cepstral coefficient features; f) performing a forced Viterbi alignment to assign a tied state label to each frame in the training data; and g) estimating at least one of the between class and with-in class scatter matrices. 5. The method of claim 4 , wherein the error rate of step (b) comprises iε(1, 2, . . . , K) wherein the fraction of the frames which have a class label ‘k’ as per the forced alignment but were misrecognized by the recognizer. 6. The method of claim 4 , wherein step (g) further comprises the steps of: a) estimating a mean of the super vector using the tied state labels of the training data by averaging over each tied state class; and b) estimating a global mean vector. 7. The method of claim 6 , wherein step (a) is determined using the mathematical equation: μ k = ∑ t = 1 N k ⁢ y k ⁡ ( t ) / N k wherein μ represents a global mean vector over a tied-state class k, y k (t) represents a super vector belonging to a tied-state, and N k represents a number of frames belonging to a class. 8. The method of claim 6 , wherein step (b) is determined using the mathematical equation: μ = ∑ t = 1 T ⁢ y ⁡ ( t ) / T wherein μ represents a global mean vector, T represents a total number of frames of a training data set, and y(t) represents a super vector. 9. The method of claim 1 , wherein step (c) is performed based on an error rate of tied state classes per an acoustic model. 10. The method of claim 9 , wherein the error rate for the between class scatter matrix is determined using the mathematical equation: S b = ∑ k = 1 K ⁢ e k × ( μ k - μ ) ⁢ ( μ k - μ ) t / K wherein S b represents a between-class scatter matrix, e k represents an error rate of each tied-state, μ represents a global mean vector, (u k −μ) represents a column vector, (u k −μ) t represents a transpose of the column vector, and K represents a tied-state. 11. The method of claim 9 , wherein the error rate for the within class scatter matrix is determined using the mathematical equation: S w = ∑ t = 1 T ⁢ ( y ⁡

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9679556B2 cover?
A system and method are presented for selectively biased linear discriminant analysis in automatic speech recognition systems. Linear Discriminant Analysis (LDA) may be used to improve the discrimination between the hidden Markov model (HMM) tied-states in the acoustic feature space. The between-class and within-class covariance matrices may be biased based on the observed recognition errors of…
Who is the assignee on this patent?
Interactive Intelligence Inc, Interactive Intelligence Group Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 13 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).