Fast speaker recognition scoring using I-vector posteriors and probabilistic linear discriminant analysis

US9373330B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9373330-B2
Application numberUS-201414454169-A
CountryUS
Kind codeB2
Filing dateAug 7, 2014
Priority dateAug 7, 2014
Publication dateJun 21, 2016
Grant dateJun 21, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for performing speaker recognition comprises: estimating respective uncertainties of acoustic coverage of respective speech utterance(s) by first and second speakers, the acoustic coverage representing respective sounds used by the speakers when speaking; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the speakers; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the speakers; and computing a score between the speech utterance(s) by the speakers in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the speakers are the same speaker.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of speaker recognition in a speaker recognition system, the method comprising: by a processor, estimating respective uncertainties of acoustic coverage of at least one speech utterance by a first speaker and at least one speech utterance by a second speaker, the acoustic coverage representing respective sounds used by the first speaker and by the second speaker when speaking; by the processor, representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the first speaker and for the second speaker to improve speaker recognition performance of the processor; by the processor, representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the first speaker and for the second speaker to further improve the speaker recognition performance of the processor; and by the processor, computing a score between the at least one speech utterance by the first speaker and the at least one speech utterance by the second speaker in a manner that leverages the respective uncertainties of the acoustic coverage during the comparison, the score being indicative of a likelihood that the first speaker and the second speaker are the same speaker. 2. A method of claim 1 wherein: representing the respective uncertainties of acoustic coverage in a manner that allows for efficient computation includes: accumulating an inverse of independent uncertainties of acoustic coverage for multiple speech utterances by the first speaker and for multiple speech utterances by the second speaker; transforming accumulated inverses of the independent uncertainties of acoustic coverage; and discarding dependencies between the uncertainties of different sounds represented in the transformed accumulated inverses to produce respective diagonalized, transformed accumulated inverses; and computing the score includes using the respective diagonalized, transformed accumulated inverses. 3. The method of claim 1 further comprising receiving, by a computer system, a set of signals corresponding to the speech utterances; and wherein representing the respective uncertainties of acoustic coverage in memory and computationally efficient manners include: computing, for each speech utterance of the set of speech utterances, a corresponding identity vector (i-vector), a diagonalized approximation of a covariance matrix of the corresponding i-vector, and a diagonalized approximation of an equivalent precision matrix associated with the corresponding i-vector; and further wherein: computing the score is based on the i-vectors, the diagonalized approximations of covariance matrices, and the diagonalized approximations of equivalent precision matrices computed, the score being indicative of a likelihood of a correspondence between the set of utterances received and a speaker. 4. The method of claim 3 , wherein computing the score includes computing the score for each speaker of a number of speakers known to the computer system, and the method further comprises: determining an identifier corresponding to the speaker having the highest score. 5. The method of claim 3 further comprising: computing a set of projected first order statistics corresponding to the set of speech utterances, based on the i-vectors and the diagonalized approximations of the equivalent precision matrices computed; computing a diagonalized approximation of a cumulative equivalent precision matrix for the set of speech utterances based on the diagonalized approximations of precision matrices computed for each i-vector; and diagonalizing a transformation of the diagonalized approximation of the cumulative equivalent precision matrix computed; and wherein, computing the score includes computing the score based on the set of projected first order statistics, the diagonalized approximation of the cumulative equivalent precision matrix, and the diagonalized transformation of the diagonalized approximation of the cumulative equivalent precision matrix. 6. The method of claim 5 further comprising: computing, for each i-vector of a set of i-vectors corresponding to a speaker known to the computer system by way of parameters that had been previously stored in an associated database, the diagonalized approximation of a covariance matrix of the i-vector corresponding to the speaker known to the computer system and a diagonalized approximation of an equivalent precision matrix associated with the i-vector corresponding to the speaker known to the computer system; and storing the i-vector and the diagonalized approximation of a covariance matrix of the i-vector in a database. 7. The method of claim 6 further comprising: computing a set of projected first order statistics, for each speaker known to the computer system, based on the set of i-vectors associated with the speaker known to the computer system and the diagonalized approximations of the equivalent precision matrices computed; computing a diagonalized approximation of a cumulative equivalent precision matrix for each speaker known to the computer system based on the diagonalized approximations of precision matrices computed for each i-vector in the set of i-vectors corresponding to the speaker known to the computer system; and diagonalizing a transformation of the diagonalized approximation of the cumulative equivalent precision matrix computed for each speaker known to computer system. 8. The method of claim 7 , wherein computing the score includes computing the score based on the set of projected first order statistics, the diagonalized approximation of the cumulative equivalent precision matrix, and the diagonalized transformation of the diagonalized approximation of the cumulative equivalent precision matrix associated with the speaker known to the computer system. 9. The method of claim 3 , wherein determining if the set of speech utterances corresponds to one or any of the number of speakers known to the computer system includes comparing the score computed to a threshold. 10. An apparatus for speaker recognition, the apparatus comprising: an estimation module configured to estimate respective uncertainties of acoustic coverage of at least one speech utterance by a first speaker and at least one speech utterance by a second speaker, the acoustic coverage representing respective sounds used by the first speaker and by the second speaker when speaking; a processor; and a memory, with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions stored thereon, are configured to cause the apparatus to: represent the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding dependencies between uncertainties of different sounds for the first speaker and for the second speaker to improve speaker recognition performance of the processor; and represent the respective uncertainties of acoustic coverage in a manner that allows for efficient computation by representing an inverse of the respective uncertainties of acoustic coverage and then discarding the dependencies between the uncertainties of different sounds for the first speaker and for the second speaker to further improve the speaker recognition performance of the processor; and a scoring module configured to compute a score between the at least one speech utterance by the first speaker and the at least one spe

Assignees

Inventors

Classifications

  • G10L17/06Primary

    Decision making techniques; Pattern matching strategies · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9373330B2 cover?
A method for performing speaker recognition comprises: estimating respective uncertainties of acoustic coverage of respective speech utterance(s) by first and second speakers, the acoustic coverage representing respective sounds used by the speakers when speaking; representing the respective uncertainties of acoustic coverage in a manner that allows for efficient memory usage by discarding depe…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L17/06. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).