Speaker verification

US9343067B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9343067-B2
Application numberUS-200913126859-A
CountryUS
Kind codeB2
Filing dateOct 29, 2009
Priority dateOct 29, 2008
Publication dateMay 17, 2016
Grant dateMay 17, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background noise is used to bring the condition of the training data closer to that of the test voice sample by modifying the training data and a reduced set of the general data, before creating adapted training and general models. Match scores are generated based on the comparison between the adapted models and the test voice sample, with a final match score calculated based on the difference between the match scores. This final match score gives a measure of the degree of matching between the test voice sample and the training utterance and is based on the degree of matching between the speech characteristics from extracted feature vectors that make up the respective speech signals, and is not a direct comparison of the raw signals themselves. Thus, the method can be used to verify a speaker without necessarily requiring the speaker to provide an identical test phrase to the phrase provided in the training sample.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of verifying the identity of a speaker in a speaker verification system, said method comprising: i) building, using a computer system including at least one computer processor, a general speaker model using feature vectors extracted from a first set of speaker utterances taken from a large population of speakers; ii) receiving training speaker utterances provided by the speaker in a training phase, modifying the received training speaker utterances provided by the speaker in the training phase using a noise sample to obtain modified training speaker utterances, and modifying a second set of sample speaker utterances using the noise sample to obtain a modified set of background speaker utterances, wherein said second set comprises speaker utterances taken from a population of speakers that is less that in the first set; iii) generating an adapted target speaker model by using feature vectors extracted from the modified training speaker utterances to adapt the general speaker model, and generating a set of adapted background speaker models by using feature vectors extracted from the modified set of background speaker utterances to adapt the general speaker model; iv) receiving a test voice sample and calculating a target match score based on a comparison between the adapted target speaker model and the received test voice sample, and calculating a set of background match scores based on a comparison between the set of adapted background speaker models and the test voice sample; v) determining a final match score representing the degree of matching between the characteristics of the training speaker utterance and the test voice sample, wherein the final match score is dependent on the difference between the target match score and the mean background match scores; and vi) verifying the identity of the speaker in the speaker verification system based on the final match score. 2. A method according to claim 1 , wherein the final match score is dependent on the difference between the target match score and the mean background match scores divided by the standard deviation of the background match scores. 3. A method according to claim 1 , wherein calculating the target match score comprises calculating the probability of a match between the feature vectors associated with the test voice sample and the adapted target speaker model, and calculating the set of background match scores comprises calculating the probability of a match between the feature vectors associated with the test voice sample and each of the adapted background speaker models in the set of adapted background speaker models. 4. A method according to claim 1 further comprising calculating a general match score based on a comparison between the general speaker model and a test voice sample. 5. A method according to claim 4 , wherein calculating the general match score comprises calculating the probability of a match between the feature vectors associated with the test voice sample and the general speaker model. 6. A method according to claim 4 , wherein the target match score is normalised with respect to the general match score, and each of the background match scores is normalised with respect to the general match score before determining the final match score. 7. A method according to claim 1 , wherein the noise and test speaker sample are extracted from a test speaker utterance provided by the speaker in a test phase subsequent to an initial training phase. 8. A method according to claim 1 , wherein the final match score is compared to a predetermined threshold, and verification of the identity of the speaker in the speaker verification system is based on the comparison of the final match score to the predetermined threshold. 9. A method according to claim 1 wherein the feature vectors are short term spectral representations of speech. 10. A speaker verification system for verifying the identity of a speaker comprising: a computer system, including at least one computer processor, the computer system being configured to provide at least: i) a model building module adapted to build a general speaker model using feature vectors extracted from a first set of speaker utterances taken from a large population of speakers, to receive training speaker utterances provided by the speaker in a training phase, to modify the training speaker utterances provided by the speaker in the training phase using a noise sample to obtain modified training speaker utterances, to modify a second set of sample speaker utterances using the noise sample to obtain a modified set of background speaker utterances wherein said second set comprises speaker utterances taken from a population of speakers that is less that in the first set, to generate an adapted target speaker model by using feature vectors extracted from the modified training speaker utterances to adapt the general speaker model, and to generate a set of adapted background speaker models by using feature vectors extracted from the modified set of background speaker utterances to adapt the general speaker model; ii) a matching module adapted to receive a test voice sample, to calculate a target match score based on a comparison between the adapted target speaker model and the received test voice sample, and to calculate a set of background match scores based on a comparison between the set of adapted background speaker models and the test voice sample; and iii) a verification module adapted to determine a final match score representing the degree of matching between the characteristics of the training speaker utterance and the test voice sample and to verify the identity of the speaker based on the final match score, wherein the final match score is dependent on the difference between the target match score and the mean background match scores. 11. A system according to claim 10 , wherein the final match score is dependent on the difference between the target match score and the mean background match scores divided by the standard deviation of the background match scores. 12. A system according to claim 10 , wherein calculation of the target match score comprises calculation of the probability of a match between the feature vectors associated with the test voice sample and the adapted target speaker model, and calculation of the set of background match scores comprises calculation of the probability of a match between the feature vectors associated with the test voice sample and each of the adapted background speaker models in the set of adapted background speaker models. 13. A system according to claim 10 wherein the computer system is configured to provide further operation comprising calculation of a general match score based on a comparison between the general speaker model and a test voice sample. 14. A system according to claim 13 , wherein calculation of the general match score comprises calculation of the probability of a match between the feature vectors associated with the test voice sample and the general speaker model. 15. A system according to claim 13 , wherein the target match score is normalized with respect to the general match score, and each of the background match scores is normalized with respect to the general match score before determining the final match score. 16. A system according to claim 10 , wherein the noise sample is extracted from a part of a test speaker utterance provided by the speaker. 17. A system according to claim 10 , wherein generating the adapted target speaker model and the set of adapted background speaker

Assignees

Inventors

Classifications

  • Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions · CPC title

  • G10L17/12Primary

    Score normalisation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9343067B2 cover?
A speaker verification method is proposed that first builds a general model of user utterances using a set of general training speech data. The user also trains the system by providing a training utterance, such as a passphrase or other spoken utterance. Then in a test phase, the user provides a test utterance which includes some background noise as well as a test voice sample. The background n…
Who is the assignee on this patent?
Ariyaeeinia Aladdin M, Pillay Surosh G, Pawlewski Mark, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L17/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 17 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).